File: index.rst

package info (click to toggle)
python-web-poet 0.23.2-1
  • links: PTS, VCS
  • area: main
  • in suites:
  • size: 908 kB
  • sloc: python: 6,112; makefile: 19
file content (67 lines) | stat: -rw-r--r-- 2,337 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
.. _spec:

=======================
Framework specification
=======================

Learn how to build a :ref:`web-poet framework <frameworks>`.

Design principles
=================

:ref:`Page objects <page-objects>` should be flexible enough to be used with:

* synchronous or asynchronous code, callback-based and ``async def / await``
  based,
* single-node and distributed systems,
* different underlying HTTP implementations - or without HTTP support
  at all, etc.


Minimum requirements
====================

A web-poet framework must support building a :ref:`page object <page-objects>`
given a page object class.

It must be able to build :ref:`input objects <inputs>` for a page object based
on type hints on the page object class, i.e. dependency injection, and
additional input data required by those input objects, such as a target URL or
a dictionary of :ref:`page parameters <page-params>`.

You can implement dependency injection with the andi_ library, which handles
signature inspection, :data:`~typing.Optional` and :data:`~typing.Union`
annotations, as well as indirect dependencies. For practical examples, see the
source code of scrapy-poet_ and of the :mod:`web_poet.example` module.

.. _andi: https://github.com/scrapinghub/andi
.. _scrapy-poet: https://github.com/scrapinghub/scrapy-poet


Additional features
===================

To provide a better experience to your users, consider extending your web-poet
framework further to:

-   Support as many input classes from the :mod:`web_poet.page_inputs`
    module as possible.

-   Support returning a :ref:`page object <page-objects>` given a target URL
    and a desired :ref:`output item class <items>`, determining the right
    :ref:`page object class <page-object-classes>` to use based on :ref:`rules
    <framework-rules>`.

-   Allow users to request an :ref:`output item <items>` directly, instead of
    requesting a page object just to call its ``to_item`` method.

    If you do, consider supporting both synchronous and asynchronous
    definitions of the ``to_item`` method, e.g. using
    :func:`~.ensure_awaitable`.

-   Support :ref:`additional requests <framework-additional-requests>`.

-   Support :ref:`retries <framework-retries>`.

-   Let users set their own :ref:`rules <rules>`, e.g. to :ref:`solve conflicts
    <rule-conflicts>`.