File: sep-017.rst

package info (click to toggle)
python-scrapy 2.14.2-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 6,332 kB
  • sloc: python: 55,629; xml: 199; makefile: 25; sh: 7
file content (109 lines) | stat: -rw-r--r-- 3,137 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
=======  ================
SEP      17
Title    Spider Contracts
Author   Insophia Team
Created  2010-06-10
Status   Draft
=======  ================

=========================
SEP-017: Spider Contracts
=========================

The motivation for Spider Contracts is to build a lightweight mechanism for
testing your spiders, and be able to run the tests quickly without having to
wait for all the spider to run. It's partially based on the
[https://en.wikipedia.org/wiki/Design_by_contract Design by contract]  approach
(hence its name) where you define certain conditions that spider callbacks must
met, and you give example testing pages.

How it works
============

In the docstring of your spider callbacks, you write certain tags that define
the spider contract. For example, the URL of a sample page for that callback,
and what you expect to scrape from it.

Then you can run a command to check that the spider contracts are met.

Contract examples
=================

gExample URL for simple callback
--------------------------------

The ``parse_product`` callback must return items containing the fields given in
``@scrapes``.

.. code-block:: python

   #!python
   class ProductSpider(BaseSpider):
       def parse_product(self, response):
           """
           @url http://www.example.com/store/product.php?id=123
           @scrapes name, price, description
           """

gChained callbacks
------------------

The following spider contains two callbacks, one for login to a site, and the
other for scraping user profile info.

The contracts assert that the first callback returns a Request and the second
one scrape ``user, name, email`` fields.

.. code-block:: python

   #!python
   class UserProfileSpider(BaseSpider):
       def parse_login_page(self, response):
           """
           @url http://www.example.com/login.php
           @returns_request
           """
           # returns Request with callback=self.parse_profile_page

       def parse_profile_page(self, response):
           """
           @after parse_login_page
           @scrapes user, name, email
           """
           # ...

Tags reference
==============

Note that tags can also be extended by users, meaning that you can have your
own custom contract tags in your Scrapy project.

==================== ==========================================================
``@url``             url of a sample page parsed by the callback
``@after``           the callback is called with the response generated by the
                     specified callback
``@scrapes``         list of fields that must be present in the item(s) scraped
                     by the callback
``@returns_request`` the callback must return one (and only one) Request
==================== ==========================================================

Some tag constraints:

 * a callback cannot contain ``@url`` and ``@after``

Checking spider contracts
=========================

To check the contracts of a single spider:

::

   scrapy-ctl.py check example.com

Or to check all spiders:

::

   scrapy-ctl.py check

No need to wait for the whole spider to run.