File: components.rst

package info (click to toggle)
python-scrapy 2.12.0-2
  • links: PTS, VCS
  • area: main
  • in suites: trixie
  • size: 5,832 kB
  • sloc: python: 50,526; xml: 199; makefile: 86; sh: 7
file content (98 lines) | stat: -rw-r--r-- 3,137 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
.. _topics-components:

==========
Components
==========

A Scrapy component is any class whose objects are built using
:func:`~scrapy.utils.misc.build_from_crawler`.

That includes the classes that you may assign to the following settings:

-   :setting:`DNS_RESOLVER`

-   :setting:`DOWNLOAD_HANDLERS`

-   :setting:`DOWNLOADER_CLIENTCONTEXTFACTORY`

-   :setting:`DOWNLOADER_MIDDLEWARES`

-   :setting:`DUPEFILTER_CLASS`

-   :setting:`EXTENSIONS`

-   :setting:`FEED_EXPORTERS`

-   :setting:`FEED_STORAGES`

-   :setting:`ITEM_PIPELINES`

-   :setting:`SCHEDULER`

-   :setting:`SCHEDULER_DISK_QUEUE`

-   :setting:`SCHEDULER_MEMORY_QUEUE`

-   :setting:`SCHEDULER_PRIORITY_QUEUE`

-   :setting:`SPIDER_MIDDLEWARES`

Third-party Scrapy components may also let you define additional Scrapy
components, usually configurable through :ref:`settings <topics-settings>`, to
modify their behavior.

.. _enforce-component-requirements:

Enforcing component requirements
================================

Sometimes, your components may only be intended to work under certain
conditions. For example, they may require a minimum version of Scrapy to work as
intended, or they may require certain settings to have specific values.

In addition to describing those conditions in the documentation of your
component, it is a good practice to raise an exception from the ``__init__``
method of your component if those conditions are not met at run time.

In the case of :ref:`downloader middlewares <topics-downloader-middleware>`,
:ref:`extensions <topics-extensions>`, :ref:`item pipelines
<topics-item-pipeline>`, and :ref:`spider middlewares
<topics-spider-middleware>`, you should raise
:exc:`scrapy.exceptions.NotConfigured`, passing a description of the issue as a
parameter to the exception so that it is printed in the logs, for the user to
see. For other components, feel free to raise whatever other exception feels
right to you; for example, :exc:`RuntimeError` would make sense for a Scrapy
version mismatch, while :exc:`ValueError` may be better if the issue is the
value of a setting.

If your requirement is a minimum Scrapy version, you may use
:attr:`scrapy.__version__` to enforce your requirement. For example:

.. code-block:: python

    from packaging.version import parse as parse_version

    import scrapy


    class MyComponent:
        def __init__(self):
            if parse_version(scrapy.__version__) < parse_version("2.7"):
                raise RuntimeError(
                    f"{MyComponent.__qualname__} requires Scrapy 2.7 or "
                    f"later, which allow defining the process_spider_output "
                    f"method of spider middlewares as an asynchronous "
                    f"generator."
                )

API reference
=============

The following function can be used to create an instance of a component class:

.. autofunction:: scrapy.utils.misc.build_from_crawler

The following function can also be useful when implementing a component, to
report the import path of the component class, e.g. when reporting problems:

.. autofunction:: scrapy.utils.python.global_object_name