1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135
|
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<meta name="author" content="John J. Lee <jjl@pobox.com>">
<meta name="date" content="2003-12">
<meta name="keywords" content="FAQ,cookie,HTTP,HTML,form,table,Python,web,client,client-side,sniffer,https,script,embedded">
<title>ClientCookie, ClientForm, ClientTable general FAQs</title>
<style type="text/css" media="screen">@import "../styles/style.css";</style>
<base href="http://wwwsearch.sourceforge.net/bits/clientx.html">
</head>
<body>
<div id="sf"><a href="http://sourceforge.net">
<img src="http://sourceforge.net/sflogo.php?group_id=48205&type=2"
width="125" height="37" alt="SourceForge.net Logo"></a></div>
<!--<img src="../images/sflogo.png"-->
<h1>Python web-client programming general FAQs</h1>
<div id="Content">
<ul>
<li>Is there any example code?
<p>There's (still!) a bit of a shortage of example code for ClientCookie
and ClientForm &co., because the stuff I've written tends to either
require access to restricted-access sites, or is proprietary code (and the
same goes for other people's code). I hope to write some good example
code soon. In the mean time, here's a crummy, ugly script:
<a href="./fastmail.py">fastmail.py</a> — requires
<a href="../ClientForm/">ClientCookie</a> and
<a href="../ClientForm/">ClientForm</a> 0.1.x (it also uses <a
href="../ClientTable/">ClientTable</a> if you have that installed, but I
haven't released the required 0.0.2a version yet).
<li>HTTPS on Windows?
<p>Use this <a href="http://pypgsql.sourceforge.net/misc/python22-win32-ssl.zip">
_socket.pyd</a>, or use Python 2.3.
<li>I want to see what my web browser is doing, but standard network sniffers
like <a href="http://www.ethereal.com/">ethereal</a> or netcat (nc) don't
work for HTTPS. How do I sniff HTTPS traffic?
<p>Three good options:
<ul>
<li>Mozilla plugin: <a href="http://livehttpheaders.mozdev.org/">
livehttpheaders</a>.
<li>Use <a href="http://lynx.browser.org/">lynx</a> <code>-trace</code>,
and filter out the junk with a script.
<li>There's also a commercial <a href="http://www.simtec.ltd.uk/">MSIE
plugin</a>.
</ul>
<p>I'm told you can also use a proxy like <a
href="http://www.proxomitron.info/">proxomitron</a> (never tried it
myself).
<li>Embedded script is messing up my web-scraping. What do I do?
<p>It is possible to embed script in HTML pages (sandwiched between
<code><SCRIPT>here</SCRIPT></code> tags, and in
<code>javascript:</code> URLs) - JavaScript / ECMAScript, VBScript, or
even Python. These scripts can do all sorts of things, including causing
cookies to be set in a browser, submitting or filling in parts of forms in
response to user actions, changing link colours as the mouse moves over a
link, etc.
<p>If you come across this in a page you want to automate, you
have three options. Here they are, roughly in order of simplicity.
<ul>
<li>Simply figure out what the embedded script is doing and emulate it
in your Python code: for example, by manually adding cookies to your
<code>CookieJar</code> instance, calling methods on
<code>HTMLForm</code>s, calling <code>urlopen</code>, etc.
<li>Dump ClientCookie and ClientForm and automate a browser instead
(eg. use MS Internet Explorer via its COM automation interfaces, using
the <a href="http://starship.python.net/crew/mhammond/">Python for
Windows extensions</a>, XXX Mozilla automation & XPCOM / PyXPCOM,
Konqueror & DCOP / KParts / PyKDE).
<li>Use Java's <a href="httpunit.sourceforge.net">httpunit</a> from
Jython, since it knows some JavaScript.
<li>Get ambitious and automatically delegate the work to an appropriate
interpreter (Mozilla's JavaScript interpreter, for instance). This
approach is the one taken by <a href="../DOMForm">DOMForm</a> (the
JavaScript support is "very alpha", though!).
</ul>
<li>Misc links
<ul>
<li>Another Java thing: <a href="http://maxq.tigris.org/">maxq</a>,
which provides a proxy to aid automatic generation of functional tests
written in Jython using the standard library unittest module (PyUnit)
and the "Jakarta Commons" HttpClient library.
<li>A useful set Zope-oriented links on <a href="XXX">tools for testing
web applications</a>.
<li>O'Reilly book: <a href="">Spidering Hacks</a>. Very Perl-oriented.
Watch this space: I may port the examples from the first chapter or two
to Python.
<li>Useful
<a href="http://chrispederick.myacen.com/work/firebird/webdeveloper/">
Mozilla plugin</a> which, amongst other things, can display HTML form
information and HTML table structure(thanks to Erno Kuusela for this
link).
</ul>
<li>Will any of this code make its way into the Python standard library?
<p>The request / response processing extensions to urllib2 from ClientCookie
are being merged into urllib2 for Python 2.4. I hope the cookies,
http-equiv and refresh code will be in there too, along with
<code>mechanize.UserAgent</code> (but <em>not</em>
<code>mechanize.Browser</code>). The rest, probably not.
</ul>
</div> <!--id="Content"-->
<p><a href="mailto:jjl@pobox.com">John J. Lee</a>, December 2003.
<hr>
</div>
<div id="Menu">
<a href="..">Home</a><br>
<!--<a href=""></a><br>-->
<br>
<a href="../ClientCookie">ClientCookie</a><br>
<a href="../ClientForm">ClientForm</a><br>
<a href="../DOMForm/">DOMForm</a><br>
<a href="../python-spidermonkey/">python-spidermonkey</a><br>
<a href="../ClientTable">ClientTable</a><br>
<a href="../mechanize/">mechanize</a><br>
<a href="../pullparser/">pullparser</a><br>
<span class="thispage">General FAQs</span><br>
<a href="./urllib2_152.py">1.5.2 urllib2.py</a><br>
<a href="./urllib_152.py">1.5.2 urllib.py</a><br>
<br>
</body>
</html>
|