File: sep-008.rst

package info (click to toggle)
python-scrapy 2.13.3-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 5,664 kB
  • sloc: python: 52,028; xml: 199; makefile: 25; sh: 7
file content (112 lines) | stat: -rw-r--r-- 3,444 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
=========   ==============================================================
SEP         8
Title       Item Parsers
Author      Pablo Hoffman
Created     2009-08-11
Status      Final (implemented with variations)
Obsoletes   :doc:`sep-001`, :doc:`sep-002`, :doc:`sep-003`, :doc:`sep-005`
=========   ==============================================================

======================
SEP-008 - Item Loaders
======================

Item Parser is the final API proposed to implement Item Builders/Loader
proposed in :doc:`sep-001`.

.. note:: This is the API that was finally implemented with the name "Item
          Loaders", instead of "Item Parsers" along with some other minor fine
          tuning to the API methods and semantics.

Dataflow
========

1. ``ItemParser.add_value()``
   1. **input_parser**
   2. store
2. ``ItemParser.add_xpath()`` *(only available in XPathItemLoader)*
   1. selector.extract()
   2. **input_parser**
   3. store
3. ``ItemParser.populate_item()`` *(ex. get_item)*
   1. **output_parser**
   2. assign field

Modules and classes
===================

- ``scrapy.contrib.itemparser.ItemParser``
- ``scrapy.contrib.itemparser.XPathItemParser``
- ``scrapy.contrib.itemparser.parsers.``MapConcat`` *(ex. ``TreeExpander``)*
- ``scrapy.contrib.itemparser.parsers.``TakeFirst``
- ``scrapy.contrib.itemparser.parsers.Join``
- ``scrapy.contrib.itemparser.parsers.Identity``

Public API
==========

- ``ItemParser.add_value()``
- ``ItemParser.replace_value()``
- ``ItemParser.populate_item()`` *(returns item populated)*

- ``ItemParser.get_collected_values()`` *(note the 's' in values)*
- ``ItemParser.parse_field()``

- ``ItemParser.get_input_parser()``
- ``ItemParser.get_output_parser()``

- ``ItemParser.context``

- ``ItemParser.default_item_class``
- ``ItemParser.default_input_parser``
- ``ItemParser.default_output_parser``
- ``ItemParser.*field*_in``
- ``ItemParser.*field*_out``

Alternative Public API Proposal
===============================

- ``ItemLoader.add_value()``
- ``ItemLoader.replace_value()``
- ``ItemLoader.load_item()`` *(returns loaded item)*

- ``ItemLoader.get_stored_values()`` or ``ItemLoader.get_values()`` *(returns the ``ItemLoader values)*
- ``ItemLoader.get_output_value()``

- ``ItemLoader.get_input_processor()`` or ``ItemLoader.get_in_processor()`` *(short version)*
- ``ItemLoader.get_output_processor()`` or ``ItemLoader.get_out_processor()`` *(short version)*

- ``ItemLoader.context``

- ``ItemLoader.default_item_class``
- ``ItemLoader.default_input_processor`` or ``ItemLoader.default_in_processor`` *(short version)*
- ``ItemLoader.default_output_processor`` or ``ItemLoader.default_out_processor`` *(short version)*
- ``ItemLoader.*field*_in``
- ``ItemLoader.*field*_out``

Usage example: declaring Item Parsers
=====================================

.. code-block:: python

   #!python
   from scrapy.contrib.itemparser import XPathItemParser, parsers


   class ProductParser(XPathItemParser):
       name_in = parsers.MapConcat(removetags, filterx)
       price_in = parsers.MapConcat(...)

       price_out = parsers.TakeFirst()

Usage example: declaring parsers in Fields
==========================================

.. code-block:: python

   #!python
   class Product(Item):
       name = Field(output_parser=parsers.Join(), ...)
       price = Field(output_parser=parsers.TakeFirst(), ...)

       description = Field(input_parser=parsers.MapConcat(removetags))