File: examples.html

package info (click to toggle)
xml2 0.5-6
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 440 kB
  • sloc: sh: 800; ansic: 595; makefile: 16
file content (147 lines) | stat: -rw-r--r-- 4,591 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
<html><head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
    <title>xml2: Examples</title>
    <link rel="stylesheet" type="text/css" href="style.css">
  </head>
  <body>
<h1>Examples</h1>

<p>Common GNU tools (<em>wget</em>, <em>grep</em>, ...) are assumed.</p>

<h2>XML</h2>

<h4>Use the <a href="http://slashdot.org/">Slashdot</a> backend.</h4>

<pre>% <b>wget -q -O - http://slashdot.org/slashdot.xml | xml2</b>
/backslash/@xmlns:backslash=http://slashdot.org/backslash.dtd
/backslash/story/title=More on Athlon Overclocking
/backslash/story/url=http://slashdot.org/articles/00/03/04/1441248.shtml
/backslash/story/time=2000-03-05 03:40:47
/backslash/story/author=Hemos
/backslash/story/department=better-faster-strong
/backslash/story/topic=amd
/backslash/story/comments=56
/backslash/story/section=articles
/backslash/story/image=topicamd.gif
/backslash/story
/backslash/story/title=New Atari Jaguar Game Running $1,225 on eBay
/backslash/story/url=http://slashdot.org/articles/00/03/02/1430232.shtml
<b>...</b>
</pre>

<h4>Now, just the headlines.</h4>

<pre>% <b>wget -q -O - http://slashdot.org/slashdot.xml | xml2 | 
  grep story/title= | cut -d= -f 2-</b>
More on Athlon Overclocking
New Atari Jaguar Game Running $1,225 on eBay
AT&amp;T;'s Korn Shell Source Code Released
TheBench.org: Community Cartooning
OpenGL for Palm OS Environment
Banner Ads on Your Cell Phone
Burning Money on Open Source
Embedded OpenBSD Running the Stallion ePipe
Bezos Responds to Tim O'Reilly's Open Letter
Update on 'Blame Canada' and the Oscars
</pre>

<h4>How big is the Red Hat 6.1 libxml RPM?</h4>

(For variety, we use <em>awk</em> rather than <em>grep</em> and <em>cut</em>.)

<pre>% <b>wget -q -O - http://rpmfind.net/linux/RDF/redhat/6.1/i386/libxml-1.4.0-1.i386.rdf | 
  xml2 | awk -F= '/RPM:Size/ {print $2}'</b>
704399
</pre>

<h4>What is the melting point of silicon?</h4>

More awkitude.  Don't let your CPU get hotter than this!

<pre>% <b>wget -q -O - http://metalab.unc.edu/xml/examples/periodic_table/allelements.xml | 
  xml2 | awk '/ATOM\/NAME=Silicon/,!/ATOM\//' | 
         awk -F\= '/MELTING_POINT/ {print $2}'</b>
Kelvin
1683
</pre>

<em>(1683ºK is 2570ºF, by the way.)</em>

<h2>HTML</h2>

<h4>Fetch the <a href="http://web.archive.org/web/20160322165048/http://slashdot.org/">Slashdot</a> news page.</h4>

You'll probably see some warnings.  (Slashdot has some of the worst HTML I've
ever seen...)

<pre>% <b>wget -q -O - http://slashdot.org/ | html2</b>
/html/head/title=Slashdot:News for Nerds. Stuff that Matters.
/html/head=
/html=
/html/body/@bgcolor=#000000
/html/body/@text=#000000
/html/body/@link=#006666
/html/body/@vlink=#000000
/html/body=
/html/body/center/a/@href=http://209.207.224.220/redir.pl?1789
/html/body/center/a/@target=_top
<b>...</b>
</pre>

<h4>Find all the links.</h4>

If you find the warnings distracting, redirect the standard error of 
<em>html2</em> to /dev/null.

<pre>% <b>wget -q -O - http://slashdot.org/ | html2 | grep 'a/@href' | 
  cut -d\= -f 2- | sort | uniq</b>
/about.shtml
/advertising.shtml
/article.pl?sid=99/03/31/0137221
/article.pl?sid=99/04/25/1438249
/article.pl?sid=99/04/27/0310247
/article.pl?sid=99/04/29/0124247
/article.pl?sid=99/08/24/1327256&amp;mode;=thread
/awards.shtml
/cheesyportal.shtml
/code.shtml
<b>...</b>
</pre>

<h4>Change some colors.</h4>

This pipeline uses both <em>html2</em> and <em>2html</em> to effect a 
round-trip.  In the middle, <em>sed</em> applies a transformation, turning
the background of every colored table on the page yellow.  Yuck, huh?

<pre>% <b>wget -q -O - http://slashdot.org/ | 
  html2 | sed 's|table/@bgcolor=\(.*\)$|table/@bgcolor=yellow|' | 
  2html &gt; slashdot.html</b>
% <b>netscape slashdot.html</b>
</pre>

<h4>Strip JavaScript from a 
<a href="http://web.archive.org/web/20160322165048/http://www.geocities.com/SiliconValley/Peaks/5957/xml.html">Geocities 
home page</a>.</h4>

Geocities uses JavaScript to create an annoying little brand popup in the 
corner of their members' home pages.  Let's delete it.

<pre>% <b>wget -q -O - http://www.geocities.com/SiliconValley/Peaks/5957/xml.html | 
  html2 | grep -vi '^[^=]*/script[/=]' | 
  2html &gt; xml.html</b>
% <b>netscape xml.html</b>
</pre>

  

<hr>
<a href="">XML/Unix Processing Tools</a>
<!--
     FILE ARCHIVED ON 16:50:48 Mar 22, 2016 AND RETRIEVED FROM THE
     INTERNET ARCHIVE ON 07:47:16 Feb 10, 2024.
     JAVASCRIPT APPENDED BY WAYBACK MACHINE, COPYRIGHT INTERNET ARCHIVE.

     ALL OTHER CONTENT MAY ALSO BE PROTECTED BY COPYRIGHT (17 U.S.C.
     SECTION 108(a)(3)).
-->