File: common-atom-elements.rst

package info (click to toggle)
feedparser 6.0.12-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 10,540 kB
  • sloc: xml: 11,459; python: 4,575; makefile: 7
file content (130 lines) | stat: -rw-r--r-- 4,465 bytes parent folder | download | duplicates (7)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
Common Atom Elements
====================

Atom feeds generally contain more information than :abbr:`RSS (Rich Site Summary)`
feeds (because more elements are required), but the most commonly used elements
are still title, link, subtitle/description, various dates, and ID.

This sample Atom feed is at `http://feedparser.org/docs/examples/atom10.xml
<http://feedparser.org/docs/examples/atom10.xml>`_.

.. sourcecode:: xml

    <?xml version="1.0" encoding="utf-8"?>
    <feed xmlns="http://www.w3.org/2005/Atom"
    xml:base="http://example.org/"
    xml:lang="en">
    <title type="text">Sample Feed</title>
    <subtitle type="html">
    For documentation &lt;em&gt;only&lt;/em&gt;
    </subtitle>
    <link rel="alternate" href="/"/>
    <link rel="self"
    type="application/atom+xml"
    href="http://www.example.org/atom10.xml"/>
    <rights type="html">
    &lt;p>Copyright 2005, Mark Pilgrim&lt;/p>&lt;
    </rights>
    <id>tag:feedparser.org,2005-11-09:/docs/examples/atom10.xml</id>
    <generator
    uri="http://example.org/generator/"
    version="4.0">
    Sample Toolkit
    </generator>
    <updated>2005-11-09T11:56:34Z</updated>
    <entry>
    <title>First entry title</title>
    <link rel="alternate"
    href="/entry/3"/>
    <link rel="related"
    type="text/html"
    href="http://search.example.com/"/>
    <link rel="via"
    type="text/html"
    href="http://toby.example.com/examples/atom10"/>
    <link rel="enclosure"
    type="video/mpeg4"
    href="http://www.example.com/movie.mp4"
    length="42301"/>
    <id>tag:feedparser.org,2005-11-09:/docs/examples/atom10.xml:3</id>
    <published>2005-11-09T00:23:47Z</published>
    <updated>2005-11-09T11:56:34Z</updated>
    <summary type="text/plain" mode="escaped">Watch out for nasty tricks</summary>
    <content type="application/xhtml+xml" mode="xml"
    xml:base="http://example.org/entry/3" xml:lang="en-US">
    <div xmlns="http://www.w3.org/1999/xhtml">Watch out for
    <span style="background: url(javascript:window.location='http://example.org/')">
    nasty tricks</span></div>
    </content>
    </entry>
    </feed>

The feed elements are available in ``d.feed``.

Accessing Common Feed Elements
------------------------------

::

    >>> import feedparser
    >>> d = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml')
    >>> d.feed.title
    u'Sample feed'
    >>> d.feed.link
    u'http://example.org/'
    >>> d.feed.subtitle
    u'For documentation <em>only</em>'
    >>> d.feed.updated
    u'2005-11-09T11:56:34Z'
    >>> d.feed.updated_parsed
    (2005, 11, 9, 11, 56, 34, 2, 313, 0)
    >>> d.feed.id
    u'tag:feedparser.org,2005-11-09:/docs/examples/atom10.xml'

Entries are available in ``d.entries``, which is a list. You access entries in
the order in which they appear in the original feed, so the first entry is
``d.entries[0]``.

Accessing Common Entry Elements
-------------------------------

::

    >>> import feedparser
    >>> d = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml')
    >>> d.entries[0].title
    u'First entry title'
    >>> d.entries[0].link
    u'http://example.org/entry/3
    >>> d.entries[0].id
    u'tag:feedparser.org,2005-11-09:/docs/examples/atom10.xml:3'
    >>> d.entries[0].published
    u'2005-11-09T00:23:47Z'
    >>> d.entries[0].published_parsed
    (2005, 11, 9, 0, 23, 47, 2, 313, 0)
    >>> d.entries[0].updated
    u'2005-11-09T11:56:34Z'
    >>> d.entries[0].updated_parsed
    (2005, 11, 9, 11, 56, 34, 2, 313, 0)
    >>> d.entries[0].summary
    u'Watch out for nasty tricks'
    >>> d.entries[0].content
    [{'type': u'application/xhtml+xml',
    'base': u'http://example.org/entry/3',
    'language': u'en-US',
    'value': u'<div>Watch out for <span>nasty tricks</span></div>'}]

.. note::

    The parsed summary and content are not the same as they appear in the
    original feed. The original elements contained dangerous :abbr:`HTML
    (HyperText Markup Language)` markup which was sanitized. See
    :ref:`advanced.sanitization` for details.

Because Atom entries can have more than one content element,
``d.entries[0].content`` is a list of dictionaries. Each dictionary contains
metadata about a single content element. The two most important values in the
dictionary are the content type, in ``d.entries[0].content[0].type``, and the
actual content value, in ``d.entries[0].content[0].value``.

You can get this level of detail on other Atom elements too.