1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192
|
= Mechanize examples
Note: Several examples show methods chained to the end of do/end blocks.
<code>do...end</code> is the same as curly braces (<code>{...}</code>). For
example, <code>do ... end.submit</code> is the same as <code>{ ...
}.submit</code>.
== Google
require 'rubygems'
require 'mechanize'
a = Mechanize.new { |agent|
agent.user_agent_alias = 'Mac Safari'
}
a.get('http://google.com/') do |page|
search_result = page.form_with(:id => 'gbqf') do |search|
search.q = 'Hello world'
end.submit
search_result.links.each do |link|
puts link.text
end
end
== Rubyforge
require 'rubygems'
require 'mechanize'
a = Mechanize.new
a.get('http://rubyforge.org/') do |page|
# Click the login link
login_page = a.click(page.link_with(:text => /Log In/))
# Submit the login form
my_page = login_page.form_with(:action => '/account/login.php') do |f|
f.form_loginname = ARGV[0]
f.form_pw = ARGV[1]
end.click_button
my_page.links.each do |link|
text = link.text.strip
next unless text.length > 0
puts text
end
end
== File Upload
Upload a file to flickr.
require 'rubygems'
require 'mechanize'
abort "#{$0} login passwd filename" if (ARGV.size != 3)
a = Mechanize.new { |agent|
# Flickr refreshes after login
agent.follow_meta_refresh = true
}
a.get('http://flickr.com/') do |home_page|
signin_page = a.click(home_page.link_with(:text => /Sign In/))
my_page = signin_page.form_with(:name => 'login_form') do |form|
form.login = ARGV[0]
form.passwd = ARGV[1]
end.submit
# Click the upload link
upload_page = a.click(my_page.link_with(:text => /Upload/))
# We want the basic upload page.
upload_page = a.click(upload_page.link_with(:text => /basic Uploader/))
# Upload the file
upload_page.form_with(:method => 'POST') do |upload_form|
upload_form.file_uploads.first.file_name = ARGV[2]
end.submit
end
== Pluggable Parsers
Let's say you want HTML pages to automatically be parsed with Rubyful Soup.
This example shows you how:
require 'rubygems'
require 'mechanize'
require 'rubyful_soup'
class SoupParser < Mechanize::Page
attr_reader :soup
def initialize(uri = nil, response = nil, body = nil, code = nil)
@soup = BeautifulSoup.new(body)
super(uri, response, body, code)
end
end
agent = Mechanize.new
agent.pluggable_parser.html = SoupParser
Now all HTML pages will be parsed with the SoupParser class, and automatically
give you access to a method called 'soup' where you can get access to the
Beautiful Soup for that page.
== Using a proxy
require 'rubygems'
require 'mechanize'
agent = Mechanize.new
agent.set_proxy 'localhost', 8000
page = agent.get(ARGV[0])
puts page.body
== The transact method
Mechanize#transact runs the given block and then resets the page history. I.e.
after the block has been executed, you're back at the original page; no need
to count how many times to call the back method at the end of a loop (while
accounting for possible exceptions).
This example also demonstrates subclassing Mechanize.
require 'rubygems'
require 'mechanize'
class TestMech < Mechanize
def process
get 'http://rubyforge.org/'
search_form = page.forms.first
search_form.words = 'WWW'
submit search_form
page.links_with(:href => %r{/projects/} ).each do |link|
next if link.href =~ %r{/projects/support/}
puts 'Loading %-30s %s' % [link.href, link.text]
begin
transact do
click link
# Do stuff, maybe click more links.
end
# Now we're back at the original page.
rescue => e
$stderr.puts "#{e.class}: #{e.message}"
end
end
end
end
TestMech.new.process
== Client Certificate Authentication (Mutual Auth)
In most cases a client certificate is created as an additional layer of
security for certain websites. The specific case that this was initially
tested on was for automating the download of archived images from a banks
(Wachovia) lockbox system. Once the certificate is installed into your
browser you will have to export it and split the certificate and private key
into separate files.
require 'rubygems'
require 'mechanize'
# create Mechanize instance
agent = Mechanize.new
# set the path of the certificate file
agent.cert = 'example.cer'
# set the path of the private key file
agent.key = 'example.key'
# get the login form & fill it out with the username/password
login_form = agent.get("http://example.com/login_page").form('Login')
login_form.Userid = 'TestUser'
login_form.Password = 'TestPassword'
# submit login form
agent.submit(login_form, login_form.buttons.first)
Exported files are usually in .p12 format (IE 7 & Firefox 2.0) which stands
for PKCS #12. You can convert them from p12 to pem format by using the
following commands:
openssl pkcs12 -in input_file.p12 -clcerts -out example.key -nocerts -nodes
openssl pkcs12 -in input_file.p12 -clcerts -out example.cer -nokeys
|