1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74
|
.. _advanced.normalization:
Content Normalization
=====================
:program:`Universal Feed Parser` can parse many different types of feeds: Atom,
:abbr:`CDF (Channel Definition Format)`, and nine different versions of
:abbr:`RSS (Rich Site Summary)`. You should not be forced to learn the
differences between these formats. :program:`Universal Feed Parser` does its
best to ensure that you can treat all feeds the same way, regardless of format
or version.
You can access the basic elements of an Atom feed using :abbr:`RSS (Rich Site Summary)` terminology.
Accessing an Atom feed as an :abbr:`RSS (Rich Site Summary)` feed
-----------------------------------------------------------------
::
>>> import feedparser
>>> d = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml')
>>> d['channel']['title']
u'Sample Feed'
>>> d['channel']['link']
u'http://example.org/'
>>> d['channel']['description']
u'For documentation <em>only</em>
>>> len(d['items'])
1
>>> e = d['items'][0]
>>> e['title']
u'First entry title'
>>> e['link']
u'http://example.org/entry/3'
>>> e['description']
u'Watch out for nasty tricks'
>>> e['author']
u'Mark Pilgrim (mark@example.org)'
The same thing works in reverse: you can access :abbr:`RSS (Rich Site Summary)` feeds as if they were Atom feeds.
Accessing an :abbr:`RSS (Rich Site Summary)` feed as an Atom feed
-----------------------------------------------------------------
::
>>> import feedparser
>>> d = feedparser.parse(' http://feedparser.org/docs/examples/rss20.xml')
>>> d.feed.subtitle_detail
{'type': 'text/html',
'base': 'http://feedparser.org/docs/examples/rss20.xml',
'language': None,
'value': u'For documentation <em>only</em>'}
>>> len(d.entries)
1
>>> e = d.entries[0]
>>> e.links
[{'rel': 'alternate',
'type': 'text/html',
'href': u'http://example.org/item/1'}]
>>> e.summary_detail
{'type': 'text/html',
'base': 'http://feedparser.org/docs/examples/rss20.xml',
'language': u'en',
'value': u'Watch out for <span>nasty tricks</span>'}
>>> e.updated_parsed
(2002, 9, 5, 0, 0, 1, 3, 248, 0)
.. note::
For more examples of how :program:`Universal Feed Parser` normalizes
content from different formats, see :ref:`annotated`.
|