1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57
|
======= ==============================================
SEP 15
Title ScrapyManager and SpiderManager API refactoring
Author Insophia Team
Created 2010-03-10
Status Final
======= ==============================================
========================================================
SEP-015: ScrapyManager and SpiderManager API refactoring
========================================================
This SEP proposes a refactoring of ``ScrapyManager`` and ``SpiderManager``
APIs.
SpiderManager
=============
- ``get(spider_name)`` -> ``Spider`` instance
- ``find_by_request(request)`` -> list of spider names
- ``list()`` -> list of spider names
- remove ``fromdomain()``, ``fromurl()``
ScrapyManager
=============
- ``crawl_request(request, spider=None)``
- calls ``SpiderManager.find_by_request(request)`` if spider is ``None``
- fails if ``len(spiders returned)`` != 1
- ``crawl_spider(spider)``
- calls ``spider.start_requests()``
- ``crawl_spider_name(spider_name)``
- calls ``SpiderManager.get(spider_name)``
- calls ``spider.start_requests()``
- ``crawl_url(url)``
- calls ``spider.make_requests_from_url()``
- remove ``crawl()``, ``runonce()``
Instead of using ``runonce()``, commands (such as crawl/parse) would call
``crawl_*`` and then ``start()``.
Changes to Commands
===================
- ``if is_url(arg):``
- calls ``ScrapyManager.crawl_url(arg)``
- ``else:``
- calls ``ScrapyManager.crawl_spider_name(arg)``
Pending issues
==============
- should we rename ``ScrapyManager.crawl_*`` to ``schedule_*`` or ``add_*`` ?
- ``SpiderManager.find_by_request`` or
``SpiderManager.search(request=request)`` ?
|