File: sep-015.rst

package info (click to toggle)
python-scrapy 2.13.3-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 5,664 kB
  • sloc: python: 52,028; xml: 199; makefile: 25; sh: 7
file content (57 lines) | stat: -rw-r--r-- 1,665 bytes parent folder | download | duplicates (5)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
=======  ==============================================
SEP      15
Title    ScrapyManager and SpiderManager API refactoring
Author   Insophia Team
Created  2010-03-10
Status   Final
=======  ==============================================

========================================================
SEP-015: ScrapyManager and SpiderManager API refactoring
========================================================

This SEP proposes a refactoring of ``ScrapyManager`` and ``SpiderManager``
APIs.

SpiderManager
=============

- ``get(spider_name)`` -> ``Spider`` instance
- ``find_by_request(request)`` -> list of spider names
- ``list()`` -> list of spider names

- remove ``fromdomain()``, ``fromurl()``

ScrapyManager
=============

- ``crawl_request(request, spider=None)``
   - calls ``SpiderManager.find_by_request(request)`` if spider is ``None``
   - fails if ``len(spiders returned)`` != 1
- ``crawl_spider(spider)``
   - calls ``spider.start_requests()``
- ``crawl_spider_name(spider_name)``
   - calls ``SpiderManager.get(spider_name)``
   - calls ``spider.start_requests()``
- ``crawl_url(url)``
   - calls ``spider.make_requests_from_url()``

- remove ``crawl()``, ``runonce()``

Instead of using ``runonce()``, commands (such as crawl/parse) would call
``crawl_*`` and then ``start()``.

Changes to Commands
===================

- ``if is_url(arg):``
   - calls ``ScrapyManager.crawl_url(arg)``
- ``else:``
   - calls ``ScrapyManager.crawl_spider_name(arg)``

Pending issues
==============

- should we rename ``ScrapyManager.crawl_*`` to ``schedule_*`` or ``add_*`` ?
- ``SpiderManager.find_by_request`` or
  ``SpiderManager.search(request=request)`` ?