File: index.html

package info (click to toggle)
sitescooper 3.1.2-1
  • links: PTS
  • area: main
  • in suites: sarge, woody
  • size: 3,000 kB
  • ctags: 662
  • sloc: perl: 8,677; makefile: 105
file content (232 lines) | stat: -rw-r--r-- 6,930 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
  <head>
    <title>
      1. Sitescooper README
    </title>
  </head>
  <body bgcolor="#ffffff" text="#000000" link="#3300cc" vlink="#660066">
    <h1>
      1. Sitescooper README
    </h1>
    <p>
      This is sitescooper, a perl script which you run on your Palm
      Computing handheld organizer's hotsync machine. It will
      retrieve news stories automatically from various news
      websites and convert them into Palm DOC, iSilo, RichReader or
      text format; in addition, it can now convert into any other
      format for which you have a conversion program that takes
      text or HTML input.
    </p>
    <p>

      (If you've just installed sitescooper, you probably don't want to
      read the blurb again; so just go straight to the
      Installation page.)

    </p>
    <p>
      HTTP and local files, using the file:/// protocol, are both
      supported.
    </p>
    <p>
      Multiple types of sites can be snarfed:
    </p>
    <blockquote>
      1-level sites, where the text to be converted is all present
      on one
      page, (such as Slashdot, Linux Weekly News, BluesNews,
      NTKnow, Ars
      Technica);
    </blockquote>
    <blockquote>
      2-level sites, where the text to be converted is linked to
      from a Table
      of Contents page (such as Wired News, BBC News, and I,
      Cringely);
    </blockquote>
    <blockquote>
      3-level sites, where the text to be converted is linked to
      from a Table
      of Contents page, which in turned is linked to from a list of
      issues
      page (such as PalmPower or New Scientist).
    </blockquote>
    <p>
      In addition sites that post news as items on one big page,
      such as Slashdot, Ars Technica, and BluesNews, are supported
      using diff.
    </p>
    <p>

      It even trims out sidebar tables automatically, by making the assumption
      that tables &lt; 30% of the average browser width are not part of the
      news story. Effectively, sitescooper is a <a
      href=http://www.research.ibm.com/networked_data_systems/transcoding/>transcoder</a>
      for handheld PCs.

    </p>
    <p>
      The script should run easily on most UNIX variants that
      support perl, as well as the Win32 platform, even Windows 95
      (tested with ActivePerl 5.00502 build 509). It has been
      reported to work on a Mac, using MacPerl 5.1.9r4.
    </p>
    <p>
      Output is supported in the following formats:
    </p>
    <ul>
      <li>
        <p>
          plain text
        </p>
      </li>
      <li>
        <p>

	  <a href="http://plucker.gnu-designs.org/">Plucker</a>, a HTML-based
	  format for Palm Computing organizers.  Plucker is free software
	  licensed under the GPL, like sitescooper.

        </p>
      </li>
      <li>
        <p>
          iSilo, a HTML-based format for the Palm Computing
          organizers from DC and Co., available from <tt><a href= 
          "http://www.isilo.com/">http://www.isilo.com/</a></tt>
        </p>
      </li>
      <li>
        <p>
          RichReader format, an RTF-based format with formatting,
          see <tt><a href= 
          "http://users.erols.com/arenakm/palm/RichReader.html">
          http://users.erols.com/arenakm/palm/RichReader.html</a></tt>
        </p>
      </li>
      <li>
        <p>
          DOC format, as used by AportisDoc, TealDoc, CSpotRun,
          etc.
        </p>
      </li>
      <li>
        <p>
          any other format using the -pipe switch.
        </p>
      </li>
    </ul>
    <p>
      DOC format, Plucker format, and text are all free.  RichReader is
      shareware, and iSilo has both shareware and free readers
      available.
    </p>
    <p>
      You may ask, "why not just use AvantGo, 'lynx -dump' and 'makedoc', or
      some other web-page-downloading software?" Well, sitescooper has several
      advantages:
    </p>
    <ul>
      <li>
        <p>
          it will follow links, and has a sophisticated set of
          mechanisms to follow the right links and use the
          "printing version" of a story;
        </p>
      </li>
      <li>
        <p>
          it can use heuristics to trim out irrelevant tables;
        </p>
      </li>
      <li>
        <p>
          the HTML rendering code is optimised for viewing on a
          Palm handheld, by trimming all images (even their ALT
          tags), forms, and extraneous headers and footers (based
          on the .site file), resulting in much more space free on
          your handheld;
        </p>
      </li>
      <li>
        <p>
          it's <i>very</i> configurable for each target site -- you can
	  even use Perl code in a site file to rewrite the HTML as it's
	  scooped;
        </p>
      </li>
      <li>
        <p>
          it tracks what stories you've already read, and is quite
	  sophisticated about removing text you've seen before;
        </p>
      </li>
      <li>
        <p>
          it's portable to UNIX, Win32, Mac, and any other
          perl-supporting platform;
        </p>
      </li>
      <li>
        <p>
          it's free software, distributed under the GNU GPL.
        </p>
      </li>
    </ul>
    <p>
      In short, it's pretty neat.
    </p>
    <p>
      Pick up the latest version of sitescooper at the following
      URL:
    </p>
    <blockquote>
      <tt><a href="http://sitescooper.org/">
      http://sitescooper.org/</a></tt>
    </blockquote>
    <p>
      Sitescooper is distributed under the GNU GPL, and as such
      is free software; you can redistribute it and/or
      modify it under the terms of the GNU General Public License
      as published by the Free Software Foundation; either version 2
      of the License, or (at your option) any later version.
    </p>

    <p>
      This program is distributed in the hope that it will be useful,
      but WITHOUT ANY WARRANTY; without even the implied warranty of
      MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
      GNU General Public License for more details.
      The full text of the GPL is available <a href=gpl.html>here</a>.
    </p>

    <p>
      The next thing to do is to follow the links below to the next
      section, Installing.
    </p>

<!-- start of nav links --><hr>
<p align=right>
<nobr> [
<a href=index.html>README</a> ]
<br>
[
<a href=running.html>Running</a> ]|[
<a href=sitescooper.html>Command-line Arguments Reference</a> ]
<br>
[
<a href=writing_site.html>Writing a Site File</a> ]|[
<a href=site_params.html>Site File Parameters Reference</a> ]
<br>
[
<a href=rss-to-site.html>The rss-to-site Conversion Tool</a> ]|[
<a href=subs-to-site.html>The subs-to-site Conversion Tool</a> ]
<br>
[
<a href=contributing.html>Contributing</a> ]|[
<a href=gpl.html>GPL</a> ]|[
<a href=http://sitescooper.org/>Home Page</a> ]
</nobr>
</p>
<!-- end of nav links --> </body></html>