File: sep-007.rst

package info (click to toggle)
python-scrapy 2.13.3-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 5,664 kB
  • sloc: python: 52,028; xml: 199; makefile: 25; sh: 7
file content (137 lines) | stat: -rw-r--r-- 3,122 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
=======  =============================
SEP      7
Title    ItemLoader processors library
Author   Ismael Carnales
Created  2009-08-10
Status   Draft
=======  =============================

======================================
SEP-007: ItemLoader processors library
======================================

This SEP proposes a library of ``ItemLoader`` processor to ship with Scrapy.

date.py
=======

``to_date``
-----------

Converts a date string to a YYYY-MM-DD one suitable for ``DateField``

**Decision**: Obsolete. ``DateField`` doesn't exists anymore.

extraction.py
=============

``extract``
-----------

This adaptor tries to extract data from the given locations. Any
``XPathSelector`` in it will be extracted, and any other data  will be added
as-is to the result.

**Decision**: Obsolete. Functionality included in ``XpathLoader``.

``ExtractImageLinks``

This adaptor may receive either XPathSelectors pointing to the desired
locations for finding image urls, or just a list of XPath expressions (which
will be turned into selectors anyway).

**Decision**: XXX

markup.py
=========

``remove_tags``
---------------

Factory that returns an adaptor for removing each tag in the ``tags`` parameter
found in the given value.  If no ``tags`` are specified, all of them are
removed.

**Decision**: XXX

``remove_root``
---------------

This adaptor removes the root tag of the given string/unicode, if it's found.

**Decision**: XXX

``replace_escape``
------------------

Factory that returns an adaptor for removing/replacing each escape character in
the ``wich_ones`` parameter found in the given value.

**Decision**: XXX

``unquote``
-----------

This factory returns an adaptor that receives a string or unicode, removes all
of the CDATAs and entities (except the ones in CDATAs, and the ones you specify
in the ``keep`` parameter) and then, returns a new string or unicode.

**Decision**: XXX

misc.py
=======

``to_unicode``
--------------

Receives a string and converts it to unicode using the given encoding (if
specified, else utf-8 is used) and returns a new unicode object. E.g:

::

   >> to_unicode('it costs 20\xe2\x82\xac, or 30\xc2\xa3')
   [u'it costs 20\u20ac, or 30\xa3']

**Decision**: XXX

``clean_spaces``
----------------

Converts multispaces into single spaces for the given string. E.g:

::

   >> clean_spaces(u'Hello   sir')
   u'Hello sir'

**Decision**: XXX

``drop_empty``
--------------

Removes any index that evaluates to None from the provided iterable. E.g:

::

   >> drop_empty([0, 'this', None, 'is', False, 'an example'])
   ['this', 'is', 'an example']

**Decision**: Obsolete. Functionality included in reducers.

``delist``
----------

This factory returns and adaptor that joins an iterable with the specified
delimiter.

**Decision**: Obsolete. Functionality included in reducers.

``Regex``
----------

This adaptor must receive either a list of strings or an XPathSelector and
return a new list with the matches of the given strings with the given regular
expression (which is passed by a keyword argument, and is mandatory for this
adaptor).

**Decision**: XXX