1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276
|
.. currentmodule:: dugong
.. _coroutines:
=============
Coroutine API
=============
This section assumes some basic familiarity with coroutines. If you
don't know what they are, you are missing out a lot and should read up
on them right away (e.g. on `Wikipedia <Wikipedia_Coroutine>`_, `PEP
342`_, `PEP 380`_ and `dabeaz.com`_).
To refresh your memory: coroutines in Python are generators, and are
obtained by calling generator functions (i.e, functions that use
``yield`` in their definiton). A coroutine can be resumed by passing
it to the built-in `next` function, or calling its `~generator.send`
method. A coroutine can pass the control flow back to the caller by
:ref:`yielding <yieldexpr>` values using the ``yield``
expression. When the coroutine eventually terminates, the last call to
`next` or `~generator.send` will raise a `StopIteration` exception,
whose *value* attribute holds the return value of the coroutine. A
coroutine *A* may also *yield from* another coroutine *B* using the
``yield from`` expression. In this case, the control flow will pass
between *A*'s caller and *B* until *B* terminates. When *B* has
terminated, its return value becomes the result of the ``yield from``
expression in *A*, and execution continues in *A*.
In Dugong, a method or function whose name begins with ``co_`` will
return a coroutine. These coroutines are non-blocking. Whenever they
need to perform an I/O operation that would block (ie., sending data
to the server or receiving data from the server), they yield a
`PollNeeded` instance instead, and expect to be resumed when the
operation can be carried out without blocking.
The `PollNeeded` instance contains information about the I/O request
that the coroutine would like to perform. The `~PollNeeded.fd`
attribute is a file descriptor, and the `~PollNeeded.mask` attribute
is an :ref:`epoll <epoll-objects>` compatible event mask. Therefore, a
very simple way to wait for a coroutine to complete is to use a
`~select.select` loop::
from select import select, POLLIN
# establish connection, send request, read response header
# Create coroutine
crt = conn.co_readall()
try:
while True:
# Resume coroutine
io_req = next(crt)
# Coroutine has returned because I/O is not ready,
# prepare select call
read_fds = (io_req.fd,) if io_req.mask & POLLIN else ()
write_fds = (io_req.fd,) if io_req.mask & POLLOUT else ()
# Wait for I/O readiness
select(read_fds, write_fds, ())
except StopIteration as exc:
# Coroutine has completed, retrieve result
body = exc.value
This loop is in fact fully equivalent to a simple ::
body = conn.readall()
so in this case there really wasn't much point in using a
coroutine. This is because coroutines really only make sense if you
have more than one active coroutine. However, in that case the
necessary loop construction becomes a lot more complicated. Luckily
enough, Dugong is compatible with the `asyncio` module, so you can use
the asyncio event loop to schedule your Dugong coroutines.
Using asyncio Event-Loops
=========================
In order to schedule a Dugong coroutine in an asyncio event loop, you
have to create an `asyncio.Future` for the coroutine. This is done
with the `dugong.AioFuture` class (which inherits from
`asyncio.Future`). The reason for this additional wrapper is that the
asyncio event loop, even though very powerful, does not know how to
interpret the `PollNeeded` instances that are yielded by Dugong
coroutines. It would have been possible to have Dugong coroutines
yield `asyncio.Future` instances directly, but this would have meant
to introduce a hard dependency on asyncio, which was deemend
undesirable.
Using asyncio, the above example becomes much simpler::
import asyncio
import atexit
# establish connection, send request, read response header
# Create coroutine
crt = conn.co_readall()
# Get a MainLoop instance from the asyncio module to switch
# between the coroutines as needed
loop = asyncio.get_event_loop()
atexit.register(loop.close)
# Create and schedule asyncio future
fut = AioFuture(crt, loop=loop)
# Run the event loop
loop.run_until_complete(fut)
# Get the result returned by the coroutine
body = fut.result()
The generalization to multiple coroutines is now
straightforward. Suppose you want to retrieve a number of documents
from different servers. You could use threads, but this makes the
program hard to debug, and probably most of the time the threads will
be waiting for data from the server, so there is no real need to have
a truly parallel program. In this situation, coroutines are a much
better choice. They allow you to send and receive multiple requests
simultaneously, but the program flow itself is still strictly
sequential. Here's how to do it (suppose the URLs you'd like to
retrieve a stored in *url_list*)::
import asyncio
import atexit
from urllib.parse import urlsplit, urlunsplit
def get_url(host, port, path):
conn = HTTPConnection(host, port=port)
yield from conn.co_send_request('GET', path)
resp = yield from conn.co_read_response()
assert resp.status == 200
body = yield from conn.co_readall()
return body
futures = []
for url in url_list:
o = urlsplit(url)
# Path is obtained by removing scheme, hostname and fragment
# identifier from the url
path = urlunsplit(('', '') + o[2:4] + ('',))
# Create a coroutine and future for each URL
futures.append(AioFuture(get_url(o.hostname, o.port, path)))
# Run coroutines
loop = asyncio.get_event_loop()
atexit.register(loop.close)
loop.run_until_complete(asyncio.wait(futures))
# Get the results
bodies = [ x.result() for x in futures ]
When to invoke `AioFuture`
--------------------------
When creating your own coroutines, you generally have two choices:
#. You can create asyncio style coroutines, in which you wrap calls to
Dugong coroutines into `AioFuture`, e.g.::
# ...
@asyncio.coroutine
def do_stuff():
# ...
yield from AioFuture(conn.co_read_response())
# ..
buf = yield from AioFuture(conn.co_read(8192))
# ...
# May also call other asyncio compatible coroutines:
yield from asyncio.sleep(1)
# ..
task = asyncio.Task(do_stuff)
loop.run_until_complete(task)
The advantage of this style is that even though you need to wrap
every Dugong call into `AioFuture`, you can freely mix Dugong and
other asyncio compatible coroutines.
#. You create Dugong style coroutines, and wrap them into `AioFuture`
just before adding them to the asyncio event loop, e.g.::
# ...
def do_stuff():
# ...
yield from conn.co_read_response()
# ..
buf = yield from conn.co_read(8192)
# ...
# Other coroutines must yield PollNeeded instance, so
# we cannot yield from asyncio compatible coroutines:
#yield from asyncio.sleep(1) # WON'T WORK!
fut = AioFuture(do_stuf())
loop.run_until_complete(fut)
The advantage of this is that you need to call `AioFuture` only
once. The disadvantage is that you can not yield from other asyncio
coroutines in your coroutine.
Generally it's recommended to use the style that produces more
readable code.
Building your own Event-Loop
============================
As explained before, the easiest way to schedule coroutines is to use
the asyncio module. However, Dugong coroutines have a well-defined
interface, and you can just as well write your own coroutine
scheduling loop. In this case, the asyncio module is not used at all.
Below is a simple example that uses this technique to switch execution
between two coroutines that send requests and read responses. The code
tries to retrieve a number of documents (stored in *path_list*),
stores the missing paths in *missing_documents*, and saves the
contents of the existing documents to disk. ::
# Note: in a real application, don't forget to ensure that
# conn.disconnect() is called eventually
conn = HTTPConnection('somehost.com')
missing_documents = []
# This function returns a coroutine that sends all requests
def send_requests():
for path in path_list:
yield from conn.co_send_request('GET', path)
# This functions returns a coroutine that reads all responses
def read_responses():
for (i, path) in enumerate(path_list):
resp = yield from conn.co_read_response()
if resp.status != 200:
missing_documents.append(resp.path)
with open('doc_%i.dat' % i, 'wb') as fh:
buf = yield from conn.readall()
fh.write(buf)
# Create coroutines
send_request_crt = send_requests()
read_response_crt = read_responses()
while True:
# Send requests until we block
if send_request_crt:
try:
io_req_1 = next(send_request_crt)
except StopIteration:
# All requests sent
send_request_crt = None
# Read responses until we block
try:
io_req_2 = next(read_response_crt)
except StopIteration as exc:
# All responses read
break
# Wait for fds to become ready for I/O
assert io_req_1.mask == POLLOUT
assert io_req_2.mask == POLLIN
select((io_req_2.fd,), (io_req_1.fd,), ())
.. _Wikipedia_Coroutine: http://en.wikipedia.org/wiki/Coroutine
.. _`PEP 342`: http://legacy.python.org/dev/peps/pep-0342/
.. _`PEP 380`: http://legacy.python.org/dev/peps/pep-0380/
.. _`dabeaz.com`: http://dabeaz.com/coroutines/
|