File: asyncio.rst

package info (click to toggle)
python-scrapy 2.12.0-2
  • links: PTS, VCS
  • area: main
  • in suites: trixie
  • size: 5,832 kB
  • sloc: python: 50,526; xml: 199; makefile: 86; sh: 7
file content (146 lines) | stat: -rw-r--r-- 4,983 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
.. _using-asyncio:

=======
asyncio
=======

.. versionadded:: 2.0

Scrapy has partial support for :mod:`asyncio`. After you :ref:`install the
asyncio reactor <install-asyncio>`, you may use :mod:`asyncio` and
:mod:`asyncio`-powered libraries in any :doc:`coroutine <coroutines>`.


.. _install-asyncio:

Installing the asyncio reactor
==============================

To enable :mod:`asyncio` support, set the :setting:`TWISTED_REACTOR` setting to
``'twisted.internet.asyncioreactor.AsyncioSelectorReactor'``.

If you are using :class:`~scrapy.crawler.CrawlerRunner`, you also need to
install the :class:`~twisted.internet.asyncioreactor.AsyncioSelectorReactor`
reactor manually. You can do that using
:func:`~scrapy.utils.reactor.install_reactor`::

    install_reactor('twisted.internet.asyncioreactor.AsyncioSelectorReactor')


.. _asyncio-preinstalled-reactor:

Handling a pre-installed reactor
================================

``twisted.internet.reactor`` and some other Twisted imports install the default
Twisted reactor as a side effect. Once a Twisted reactor is installed, it is
not possible to switch to a different reactor at run time.

If you :ref:`configure the asyncio Twisted reactor <install-asyncio>` and, at
run time, Scrapy complains that a different reactor is already installed,
chances are you have some such imports in your code.

You can usually fix the issue by moving those offending module-level Twisted
imports to the method or function definitions where they are used. For example,
if you have something like:

.. code-block:: python

    from twisted.internet import reactor


    def my_function():
        reactor.callLater(...)

Switch to something like:

.. code-block:: python

    def my_function():
        from twisted.internet import reactor

        reactor.callLater(...)

Alternatively, you can try to :ref:`manually install the asyncio reactor
<install-asyncio>`, with :func:`~scrapy.utils.reactor.install_reactor`, before
those imports happen.


.. _asyncio-await-dfd:

Awaiting on Deferreds
=====================

When the asyncio reactor isn't installed, you can await on Deferreds in the
coroutines directly. When it is installed, this is not possible anymore, due to
specifics of the Scrapy coroutine integration (the coroutines are wrapped into
:class:`asyncio.Future` objects, not into
:class:`~twisted.internet.defer.Deferred` directly), and you need to wrap them into
Futures. Scrapy provides two helpers for this:

.. autofunction:: scrapy.utils.defer.deferred_to_future
.. autofunction:: scrapy.utils.defer.maybe_deferred_to_future
.. tip:: If you need to use these functions in code that aims to be compatible
         with lower versions of Scrapy that do not provide these functions,
         down to Scrapy 2.0 (earlier versions do not support
         :mod:`asyncio`), you can copy the implementation of these functions
         into your own code.


.. _enforce-asyncio-requirement:

Enforcing asyncio as a requirement
==================================

If you are writing a :ref:`component <topics-components>` that requires asyncio
to work, use :func:`scrapy.utils.reactor.is_asyncio_reactor_installed` to
:ref:`enforce it as a requirement <enforce-component-requirements>`. For
example:

.. code-block:: python

    from scrapy.utils.reactor import is_asyncio_reactor_installed


    class MyComponent:
        def __init__(self):
            if not is_asyncio_reactor_installed():
                raise ValueError(
                    f"{MyComponent.__qualname__} requires the asyncio Twisted "
                    f"reactor. Make sure you have it configured in the "
                    f"TWISTED_REACTOR setting. See the asyncio documentation "
                    f"of Scrapy for more information."
                )


.. _asyncio-windows:

Windows-specific notes
======================

The Windows implementation of :mod:`asyncio` can use two event loop
implementations, :class:`~asyncio.ProactorEventLoop` (default) and
:class:`~asyncio.SelectorEventLoop`. However, only
:class:`~asyncio.SelectorEventLoop` works with Twisted.

Scrapy changes the event loop class to :class:`~asyncio.SelectorEventLoop`
automatically when you change the :setting:`TWISTED_REACTOR` setting or call
:func:`~scrapy.utils.reactor.install_reactor`.

.. note:: Other libraries you use may require
          :class:`~asyncio.ProactorEventLoop`, e.g. because it supports
          subprocesses (this is the case with `playwright`_), so you cannot use
          them together with Scrapy on Windows (but you should be able to use
          them on WSL or native Linux).

.. _playwright: https://github.com/microsoft/playwright-python


.. _using-custom-loops:

Using custom asyncio loops
==========================

You can also use custom asyncio event loops with the asyncio reactor. Set the
:setting:`ASYNCIO_EVENT_LOOP` setting to the import path of the desired event
loop class to use it instead of the default asyncio event loop.