File: scanYahoo.py

package info (click to toggle)
pyparsing 1.5.6%2Bdfsg1-2
  • links: PTS, VCS
  • area: main
  • in suites: wheezy
  • size: 8,220 kB
  • sloc: python: 13,752; makefile: 33; sh: 17
file content (14 lines) | stat: -rw-r--r-- 407 bytes parent folder | download | duplicates (5)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
from pyparsing import makeHTMLTags,SkipTo,htmlComment
import urllib

serverListPage = urllib.urlopen( "http://www.yahoo.com" )
htmlText = serverListPage.read()
serverListPage.close()

aStart,aEnd = makeHTMLTags("A")

link = aStart + SkipTo(aEnd).setResultsName("link") + aEnd
link.ignore(htmlComment)

for toks,start,end in link.scanString(htmlText):
    print toks.link, "->", toks.startA.href