1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106
|
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Content Normalization [Universal Feed Parser]</title>
<link rel="stylesheet" href="feedparser.css" type="text/css">
<link rev="made" href="mailto:mark@diveintomark.org">
<meta name="generator" content="DocBook XSL Stylesheets V1.65.1">
<meta name="keywords" content="RSS, Atom, CDF, XML, feed, parser, Python">
<link rel="start" href="index.html" title="Documentation">
<link rel="up" href="advanced.html" title="Advanced Features">
<link rel="prev" href="html-sanitization.html" title="HTML Sanitization">
<link rel="next" href="namespace-handling.html" title="Namespace Handling">
</head>
<body id="feedparser-org" class="docs">
<div class="z" id="intro"><div class="sectionInner"><div class="sectionInner2">
<div class="s" id="pageHeader">
<h1><a href="/"><span>Universal Feed Parser</span></a></h1>
<p><span>Parse RSS and Atom feeds in Python. 3000 unit tests. Open source.</span></p>
</div>
<div class="s" id="quickSummary"><ul>
<li class="li1">
<a href="http://sourceforge.net/projects/feedparser/"><span>Download</span></a> ·</li>
<li class="li2">
<a href="http://feedparser.org/docs/"><span>Documentation</span></a> ·</li>
<li class="li3">
<a href="http://feedparser.org/tests/"><span>Unit tests</span></a> ·</li>
<li class="li4"><a href="http://sourceforge.net/tracker/?func=browse&group_id=112328&atid=661937"><span>Report a bug</span></a></li>
</ul></div>
</div></div></div>
<div id="main"><div id="mainInner">
<p id="breadcrumb">You are here: <a href="index.html">Documentation</a> → <a href="advanced.html">Advanced Features</a> → <span class="thispage">Content Normalization</span></p>
<div class="section" lang="en">
<div class="titlepage">
<div>
<div><h2 class="title">
<a name="advanced.normalization" class="skip" href="#advanced.normalization" title="link to this section"><img src="images/permalink.gif" alt="[link]" title="link to this section" width="8" height="9"></a> Content Normalization</h2></div>
<div><div class="abstract">
<h3 class="title"></h3>
<p><span class="application">Universal Feed Parser</span> can parse many different types of feeds: Atom, <acronym title="Channel Definition Format">CDF</acronym>, and nine different versions of <acronym title="Rich Site Summary">RSS</acronym>. You should not be forced to learn the differences between these formats. <span class="application">Universal Feed Parser</span> does its best to ensure that you can treat all feeds the same way, regardless of format or version.</p>
</div></div>
</div>
<div></div>
</div>
<p>You can access the basic elements of an Atom feed using <acronym title="Rich Site Summary">RSS</acronym> terminology.</p>
<div class="example">
<a name="example.atom.as.rss" class="skip" href="#example.atom.as.rss" title="link to this example"><img src="images/permalink.gif" alt="[link]" title="link to this example" width="8" height="9"></a> <h3 class="title">Example: Accessing an Atom feed as an <acronym title="Rich Site Summary">RSS</acronym> feed</h3>
<pre class="screen"><tt class="prompt">>>> </tt><span class="userinput"><font color='navy'><b>import</b></font> feedparser</span>
<tt class="prompt">>>> </tt><span class="userinput">d = feedparser.parse('<a href="http://feedparser.org/docs/examples/atom10.xml">http://feedparser.org/docs/examples/atom10.xml</a>')</span>
<tt class="prompt">>>> </tt><span class="userinput">d[<font color='olive'>'channel'</font>][<font color='olive'>'title'</font>]</span>
<span class="computeroutput">u'Sample Feed'</span>
<tt class="prompt">>>> </tt><span class="userinput">d[<font color='olive'>'channel'</font>][<font color='olive'>'link'</font>]</span>
<span class="computeroutput">u'http://example.org/'</span>
<tt class="prompt">>>> </tt><span class="userinput">d[<font color='olive'>'channel'</font>][<font color='olive'>'description'</font>]</span>
<span class="computeroutput">u'For documentation <em>only</em></span>
<tt class="prompt">>>> </tt><span class="userinput">len(d[<font color='olive'>'items'</font>])</span>
<span class="computeroutput">1</span>
<tt class="prompt">>>> </tt><span class="userinput">e = d[<font color='olive'>'items'</font>][0]</span>
<tt class="prompt">>>> </tt><span class="userinput">e[<font color='olive'>'title'</font>]</span>
<span class="computeroutput">u'First entry title'</span>
<tt class="prompt">>>> </tt><span class="userinput">e[<font color='olive'>'link'</font>]</span>
<span class="computeroutput">u'http://example.org/entry/3'</span>
<tt class="prompt">>>> </tt><span class="userinput">e[<font color='olive'>'description'</font>]</span>
<span class="computeroutput">u'Watch out for nasty tricks'</span>
<tt class="prompt">>>> </tt><span class="userinput">e[<font color='olive'>'author'</font>]</span>
<span class="computeroutput">u'Mark Pilgrim (mark@example.org)'</span></pre>
</div>
<p>The same thing works in reverse: you can access <acronym title="Rich Site Summary">RSS</acronym> feeds as if they were Atom feeds.</p>
<div class="example">
<a name="example.rss.as.atom" class="skip" href="#example.rss.as.atom" title="link to this example"><img src="images/permalink.gif" alt="[link]" title="link to this example" width="8" height="9"></a> <h3 class="title">Example: Accessing an <acronym title="Rich Site Summary">RSS</acronym> feed as an Atom feed</h3>
<pre class="screen"><tt class="prompt">>>> </tt><span class="userinput"><font color='navy'><b>import</b></font> feedparser</span>
<tt class="prompt">>>> </tt><span class="userinput">d = feedparser.parse('<a href="http://feedparser.org/docs/examples/rss20.xml">http://feedparser.org/docs/examples/rss20.xml</a>')</span>
<tt class="prompt">>>> </tt><span class="userinput">d.feed.subtitle_detail</span>
<span class="computeroutput">{'type': 'text/html',
'base': 'http://feedparser.org/docs/examples/rss20.xml',
'language': None,
'value': u'For documentation <em>only</em>'}</span>
<tt class="prompt">>>> </tt><span class="userinput">len(d.entries)</span>
<span class="computeroutput">1</span>
<tt class="prompt">>>> </tt><span class="userinput">e = d.entries[0]</span>
<tt class="prompt">>>> </tt><span class="userinput">e.links</span>
<span class="computeroutput">[{'rel': 'alternate',
'type': 'text/html',
'href': u'http://example.org/item/1'}]</span>
<tt class="prompt">>>> </tt><span class="userinput">e.summary_detail</span>
<span class="computeroutput">{'type': 'text/html',
'base': 'http://feedparser.org/docs/examples/rss20.xml',
'language': u'en',
'value': u'Watch out for <span>nasty tricks</span>'}</span>
<tt class="prompt">>>> </tt><span class="userinput">e.updated_parsed</span>
<span class="computeroutput">(2002, 9, 5, 0, 0, 1, 3, 248, 0)</span></pre>
</div>
<a name="id4956983"></a><table class="note" border="0" summary="">
<tr><td rowspan="2" align="center" valign="top" width="1%"><img src="images/note.png" alt="Note" title="" width="24" height="24"></td></tr>
<tr><td colspan="2" align="left" valign="top" width="99%">For more examples of how <span class="application">Universal Feed Parser</span> normalizes content from different formats, see <a href="annotated-examples.html" title="Annotated Examples">Annotated Examples</a>.</td></tr>
</table>
</div>
<div style="float: left">← <a class="NavigationArrow" href="html-sanitization.html">HTML Sanitization</a>
</div>
<div style="text-align: right">
<a class="NavigationArrow" href="namespace-handling.html">Namespace Handling</a> →</div>
<hr style="clear:both">
<div class="footer"><p class="copyright">Copyright © 2004, 2005, 2006 Mark Pilgrim</p></div>
</div></div>
</body>
</html>
|