File: common-atom-elements.html

package info (click to toggle)
nodebox-web 1.9.2-2
  • links: PTS
  • area: main
  • in suites: lenny
  • size: 1,724 kB
  • ctags: 1,254
  • sloc: python: 6,161; sh: 602; xml: 239; makefile: 33
file content (149 lines) | stat: -rw-r--r-- 10,517 bytes parent folder | download | duplicates (5)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Common Atom Elements [Universal Feed Parser]</title>
<link rel="stylesheet" href="feedparser.css" type="text/css">
<link rev="made" href="mailto:mark@diveintomark.org">
<meta name="generator" content="DocBook XSL Stylesheets V1.65.1">
<meta name="keywords" content="RSS, Atom, CDF, XML, feed, parser, Python">
<link rel="start" href="index.html" title="Documentation">
<link rel="up" href="basic.html" title="Basic Features">
<link rel="prev" href="common-rss-elements.html" title="Common RSS Elements">
<link rel="next" href="atom-detail.html" title="Getting Detailed Information on Atom Elements">
</head>
<body id="feedparser-org" class="docs">
<div class="z" id="intro"><div class="sectionInner"><div class="sectionInner2">
<div class="s" id="pageHeader">
<h1><a href="/"><span>Universal Feed Parser</span></a></h1>
<p><span>Parse RSS and Atom feeds in Python.  3000 unit tests.  Open source.</span></p>
</div>
<div class="s" id="quickSummary"><ul>
<li class="li1">
<a href="http://sourceforge.net/projects/feedparser/"><span>Download</span></a> ·</li>
<li class="li2">
<a href="http://feedparser.org/docs/"><span>Documentation</span></a> ·</li>
<li class="li3">
<a href="http://feedparser.org/tests/"><span>Unit tests</span></a> ·</li>
<li class="li4"><a href="http://sourceforge.net/tracker/?func=browse&amp;group_id=112328&amp;atid=661937"><span>Report a bug</span></a></li>
</ul></div>
</div></div></div>
<div id="main"><div id="mainInner">
<p id="breadcrumb">You are here: <a href="index.html">Documentation</a> → <a href="basic.html">Basic Features</a> → <span class="thispage">Common Atom Elements</span></p>
<div class="section" lang="en">
<div class="titlepage">
<div><div><h2 class="title">
<a name="basic.atom" class="skip" href="#basic.atom" title="link to this section"><img src="images/permalink.gif" alt="[link]" title="link to this section" width="8" height="9"></a> Common Atom Elements</h2></div></div>
<div></div>
</div>
<div class="abstract"><p>Atom feeds generally contain more information than <acronym title="Rich Site Summary">RSS</acronym> feeds (because more elements are required), but the most commonly used elements are still title, link, subtitle/description, various dates, and ID.</p></div>
<p>This sample Atom feed is at <a href="http://feedparser.org/docs/examples/atom10.xml">http://feedparser.org/docs/examples/atom10.xml</a>.</p>
<div class="informalexample"><pre class="programlisting ">&lt;?xml version="1.0" encoding="utf-8"?&gt;
&lt;feed xmlns="http://www.w3.org/2005/Atom"
      xml:base="http://example.org/"
      xml:lang="en"&gt;
  &lt;title type="text"&gt;Sample Feed&lt;/title&gt;
  &lt;subtitle type="html"&gt;
    For documentation &amp;lt;em&amp;gt;only&amp;lt;/em&amp;gt;
  &lt;/subtitle&gt;
  &lt;link rel="alternate" href="/"/&gt;
  &lt;link rel="self"
      type="application/atom+xml"
      href="http://www.example.org/atom10.xml"/&gt;
  &lt;rights type="html"&gt;
      &amp;lt;p&gt;Copyright 2005, Mark Pilgrim&amp;lt;/p&gt;&amp;lt;
  &lt;/rights&gt;
  &lt;id&gt;tag:feedparser.org,2005-11-09:/docs/examples/atom10.xml&lt;/id&gt;
  &lt;generator
      uri="http://example.org/generator/"
      version="4.0"&gt;
          Sample Toolkit
  &lt;/generator&gt;
  &lt;updated&gt;2005-11-09T11:56:34Z&lt;/updated&gt;
  &lt;entry&gt;
    &lt;title&gt;First entry title&lt;/title&gt;
    &lt;link rel="alternate"
        href="/entry/3"/&gt;
    &lt;link rel="related"
        type="text/html"
        href="http://search.example.com/"/&gt;
    &lt;link rel="via"
        type="text/html"
        href="http://toby.example.com/examples/atom10"/&gt;
    &lt;link rel="enclosure"
        type="video/mpeg4"
        href="http://www.example.com/movie.mp4"
        length="42301"/&gt;
    &lt;id&gt;tag:feedparser.org,2005-11-09:/docs/examples/atom10.xml:3&lt;/id&gt;
    &lt;published&gt;2005-11-09T00:23:47Z&lt;/published&gt;
    &lt;updated&gt;2005-11-09T11:56:34Z&lt;/updated&gt;
    &lt;summary type="text/plain" mode="escaped"&gt;Watch out for nasty tricks&lt;/summary&gt;
    &lt;content type="application/xhtml+xml" mode="xml"
             xml:base="http://example.org/entry/3" xml:lang="en-US"&gt;
      &lt;div xmlns="http://www.w3.org/1999/xhtml"&gt;Watch out for
      &lt;span style="background: url(javascript:window.location='http://example.org/')"&gt;
      nasty tricks&lt;/span&gt;&lt;/div&gt;
    &lt;/content&gt;
  &lt;/entry&gt;
&lt;/feed&gt;</pre></div>
<p>The <tt class="sgmltag-element">feed</tt> elements are available in <tt class="varname">d.feed</tt>.</p>
<div class="example">
<a name="example.atom.feed" class="skip" href="#example.atom.feed" title="link to this example"><img src="images/permalink.gif" alt="[link]" title="link to this example" width="8" height="9"></a> <h3 class="title">Example: Accessing Common Feed Elements</h3>
<pre class="screen"><tt class="prompt">&gt;&gt;&gt; </tt><span class="userinput"><font color='navy'><b>import</b></font> feedparser</span>
<tt class="prompt">&gt;&gt;&gt; </tt><span class="userinput">d = feedparser.parse('<a href="http://feedparser.org/docs/examples/atom10.xml">http://feedparser.org/docs/examples/atom10.xml</a>')</span>
<tt class="prompt">&gt;&gt;&gt; </tt><span class="userinput">d.feed.title</span>
<span class="computeroutput">u'Sample feed'</span>
<tt class="prompt">&gt;&gt;&gt; </tt><span class="userinput">d.feed.link</span>
<span class="computeroutput">u'http://example.org/'</span>
<tt class="prompt">&gt;&gt;&gt; </tt><span class="userinput">d.feed.subtitle</span>
<span class="computeroutput">u'For documentation &lt;em&gt;only&lt;/em&gt;'</span>
<tt class="prompt">&gt;&gt;&gt; </tt><span class="userinput">d.feed.updated</span>
<span class="computeroutput">u'2005-11-09T11:56:34Z'</span>
<tt class="prompt">&gt;&gt;&gt; </tt><span class="userinput">d.feed.updated_parsed</span>
<span class="computeroutput">(2005, 11, 9, 11, 56, 34, 2, 313, 0)</span>
<tt class="prompt">&gt;&gt;&gt; </tt><span class="userinput">d.feed.id</span>
<span class="computeroutput">u'tag:feedparser.org,2005-11-09:/docs/examples/atom10.xml'</span></pre>
</div>
<p>Entries are available in <tt class="varname">d.entries</tt>, which is a list.  You access entries in the order in which they appear in the original feed, so the first entry is <tt class="varname">d.entries[0]</tt>.</p>
<div class="example">
<a name="example.atom.entry" class="skip" href="#example.atom.entry" title="link to this example"><img src="images/permalink.gif" alt="[link]" title="link to this example" width="8" height="9"></a> <h3 class="title">Example: Accessing Common Entry Elements</h3>
<pre class="screen"><tt class="prompt">&gt;&gt;&gt; </tt><span class="userinput"><font color='navy'><b>import</b></font> feedparser</span>
<tt class="prompt">&gt;&gt;&gt; </tt><span class="userinput">d = feedparser.parse('<a href="http://feedparser.org/docs/examples/atom10.xml">http://feedparser.org/docs/examples/atom10.xml</a>')</span>
<tt class="prompt">&gt;&gt;&gt; </tt><span class="userinput">d.entries[0].title</span>
<span class="computeroutput">u'First entry title'</span>
<tt class="prompt">&gt;&gt;&gt; </tt><span class="userinput">d.entries[0].link</span>
<span class="computeroutput">u'http://example.org/entry/3</span>
<tt class="prompt">&gt;&gt;&gt; </tt><span class="userinput">d.entries[0].id</span>
<span class="computeroutput">u'tag:feedparser.org,2005-11-09:/docs/examples/atom10.xml:3'</span>
<tt class="prompt">&gt;&gt;&gt; </tt><span class="userinput">d.entries[0].published</span>
<span class="computeroutput">u'2005-11-09T00:23:47Z'</span>
<tt class="prompt">&gt;&gt;&gt; </tt><span class="userinput">d.entries[0].published_parsed</span>
<span class="computeroutput">(2005, 11, 9, 0, 23, 47, 2, 313, 0)</span>
<tt class="prompt">&gt;&gt;&gt; </tt><span class="userinput">d.entries[0].updated</span>
<span class="computeroutput">u'2005-11-09T11:56:34Z'</span>
<tt class="prompt">&gt;&gt;&gt; </tt><span class="userinput">d.entries[0].updated_parsed</span>
<span class="computeroutput">(2005, 11, 9, 11, 56, 34, 2, 313, 0)</span>
<tt class="prompt">&gt;&gt;&gt; </tt><span class="userinput">d.entries[0].summary</span>
<span class="computeroutput">u'Watch out for nasty tricks'</span>
<tt class="prompt">&gt;&gt;&gt; </tt><span class="userinput">d.entries[0].content</span>
<span class="computeroutput">[{'type': u'application/xhtml+xml',
 'base': u'http://example.org/entry/3',
 'language': u'en-US',
 'value': u'&lt;div&gt;Watch out for &lt;span&gt;nasty tricks&lt;/span&gt;&lt;/div&gt;'}]</span></pre>
</div>
<a name="id4952545"></a><table class="note" border="0" summary="">
<tr><td rowspan="2" align="center" valign="top" width="1%"><img src="images/note.png" alt="Note" title="" width="24" height="24"></td></tr>
<tr><td colspan="2" align="left" valign="top" width="99%">The parsed <tt class="sgmltag-element">summary</tt> and <tt class="sgmltag-element">content</tt> are not the same as they appear in the original feed.  The original elements contained dangerous <acronym title="HyperText Markup Language">HTML</acronym> markup which was sanitized.  See <a href="html-sanitization.html" title="HTML Sanitization">HTML Sanitization</a> for details.</td></tr>
</table>
<p>Because Atom entries can have more than one <tt class="sgmltag-element">content</tt> element, <tt class="varname">d.entries[0].content</tt> is a list of dictionaries.  Each dictionary contains metadata about a single <tt class="sgmltag-element">content</tt> element.  The two most important values in the dictionary are the content type, in <tt class="varname">d.entries[0].content[0].type</tt>, and the actual content value, in <tt class="varname">d.entries[0].content[0].value</tt>.</p>
<p>You can get this level of detail on other Atom elements too.</p>
</div>
<div style="float: left">← <a class="NavigationArrow" href="common-rss-elements.html">Common RSS Elements</a>
</div>
<div style="text-align: right">
<a class="NavigationArrow" href="atom-detail.html">Getting Detailed Information on Atom Elements</a> →</div>
<hr style="clear:both">
<div class="footer"><p class="copyright">Copyright © 2004, 2005, 2006 Mark Pilgrim</p></div>
</div></div>
</body>
</html>