File: content-normalization.rst

package info (click to toggle)
feedparser 6.0.12-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 10,540 kB
  • sloc: xml: 11,459; python: 4,575; makefile: 7
file content (74 lines) | stat: -rw-r--r-- 2,371 bytes parent folder | download | duplicates (7)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
.. _advanced.normalization:

Content Normalization
=====================

:program:`Universal Feed Parser` can parse many different types of feeds: Atom,
:abbr:`CDF (Channel Definition Format)`, and nine different versions of
:abbr:`RSS (Rich Site Summary)`.  You should not be forced to learn the
differences between these formats.  :program:`Universal Feed Parser` does its
best to ensure that you can treat all feeds the same way, regardless of format
or version.

You can access the basic elements of an Atom feed using :abbr:`RSS (Rich Site Summary)` terminology.

Accessing an Atom feed as an :abbr:`RSS (Rich Site Summary)` feed
-----------------------------------------------------------------

::

    >>> import feedparser
    >>> d = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml')
    >>> d['channel']['title']
    u'Sample Feed'
    >>> d['channel']['link']
    u'http://example.org/'
    >>> d['channel']['description']
    u'For documentation <em>only</em>
    >>> len(d['items'])
    1
    >>> e = d['items'][0]
    >>> e['title']
    u'First entry title'
    >>> e['link']
    u'http://example.org/entry/3'
    >>> e['description']
    u'Watch out for nasty tricks'
    >>> e['author']
    u'Mark Pilgrim (mark@example.org)'


The same thing works in reverse: you can access :abbr:`RSS (Rich Site Summary)` feeds as if they were Atom feeds.

Accessing an :abbr:`RSS (Rich Site Summary)` feed as an Atom feed
-----------------------------------------------------------------

::

    >>> import feedparser
    >>> d = feedparser.parse(' http://feedparser.org/docs/examples/rss20.xml')
    >>> d.feed.subtitle_detail
    {'type': 'text/html',
    'base': 'http://feedparser.org/docs/examples/rss20.xml',
    'language': None,
    'value': u'For documentation <em>only</em>'}
    >>> len(d.entries)
    1
    >>> e = d.entries[0]
    >>> e.links
    [{'rel': 'alternate',
    'type': 'text/html',
    'href': u'http://example.org/item/1'}]
    >>> e.summary_detail
    {'type': 'text/html',
    'base': 'http://feedparser.org/docs/examples/rss20.xml',
    'language': u'en',
    'value': u'Watch out for <span>nasty tricks</span>'}
    >>> e.updated_parsed
    (2002, 9, 5, 0, 0, 1, 3, 248, 0)


.. note::

    For more examples of how :program:`Universal Feed Parser` normalizes
    content from different formats, see :ref:`annotated`.