1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109
|
======= ================
SEP 17
Title Spider Contracts
Author Insophia Team
Created 2010-06-10
Status Draft
======= ================
=========================
SEP-017: Spider Contracts
=========================
The motivation for Spider Contracts is to build a lightweight mechanism for
testing your spiders, and be able to run the tests quickly without having to
wait for all the spider to run. It's partially based on the
[https://en.wikipedia.org/wiki/Design_by_contract Design by contract] approach
(hence its name) where you define certain conditions that spider callbacks must
met, and you give example testing pages.
How it works
============
In the docstring of your spider callbacks, you write certain tags that define
the spider contract. For example, the URL of a sample page for that callback,
and what you expect to scrape from it.
Then you can run a command to check that the spider contracts are met.
Contract examples
=================
gExample URL for simple callback
--------------------------------
The ``parse_product`` callback must return items containing the fields given in
``@scrapes``.
.. code-block:: python
#!python
class ProductSpider(BaseSpider):
def parse_product(self, response):
"""
@url http://www.example.com/store/product.php?id=123
@scrapes name, price, description
"""
gChained callbacks
------------------
The following spider contains two callbacks, one for login to a site, and the
other for scraping user profile info.
The contracts assert that the first callback returns a Request and the second
one scrape ``user, name, email`` fields.
.. code-block:: python
#!python
class UserProfileSpider(BaseSpider):
def parse_login_page(self, response):
"""
@url http://www.example.com/login.php
@returns_request
"""
# returns Request with callback=self.parse_profile_page
def parse_profile_page(self, response):
"""
@after parse_login_page
@scrapes user, name, email
"""
# ...
Tags reference
==============
Note that tags can also be extended by users, meaning that you can have your
own custom contract tags in your Scrapy project.
==================== ==========================================================
``@url`` url of a sample page parsed by the callback
``@after`` the callback is called with the response generated by the
specified callback
``@scrapes`` list of fields that must be present in the item(s) scraped
by the callback
``@returns_request`` the callback must return one (and only one) Request
==================== ==========================================================
Some tag constraints:
* a callback cannot contain ``@url`` and ``@after``
Checking spider contracts
=========================
To check the contracts of a single spider:
::
scrapy-ctl.py check example.com
Or to check all spiders:
::
scrapy-ctl.py check
No need to wait for the whole spider to run.
|