File: README

package info (click to toggle)
pavuk 0.9.35-2.1
  • links: PTS
  • area: main
  • in suites: lenny, squeeze
  • size: 4,720 kB
  • ctags: 3,824
  • sloc: ansic: 51,779; sh: 3,468; makefile: 363
file content (126 lines) | stat: -rw-r--r-- 4,621 bytes parent folder | download | duplicates (5)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
This is public beta release of Pavuk.
Pavuk in Slovak language means spider.

What this program does :
-recursive HTTP, HTTP over SSL, FTP, FTP over SSL and Gopher document
 retrieving
-supports HTTP/1.1 with persistent connections
-supports HTTP POST requests
-can automaticaly fill forms from HTML documents and make POST or GET
 requestes based on user input and form content
-synchronizing retrieved local copies of document with remote
-partial content retrieving on servers which suppots it (FTP and HTTP/1.1)
-automaticaly follows moved documents
-supports robots exclusion standard via "robots.txt" and <META NAME=robots ..>
-supports HTTP, FTP, Gopher, SSL proxy servers
-supports HTTP authentification (user, Basic, Digest, NTLM)
-shows document tree
-have interface to "at" command for scheduling
-have optional GTK+ user interface
-may be built with or without X Window user interface
-can handle setup files (scenarios)
-supports NLS via GNU gettext (so messages are translatable to many native 
 languages)
-can be used for fetching documents to proxy/cache server (-mode dontstore)
-supports HTTP cookies
-supports SOCKS(4/5) proxy
-you can run as many as instances of pavuk in same tree without any loose
 of data, because pavuk locks documents while procesing
-you can limit transfer rate over network (speed throttling)
-have powerfull mechanism for mapping URLs to local filenames (-fnrules)
-can check URL wheter it is modified and can send list of modified URLs to any 
 script
-can load files from Netscape or MSIE browser cache directory
-can filter advertisement banners
-can automaticaly turn on/off output of messages to terminal when running
 in foreground or background
-can run users postprocessing scripts for each downloaded document
-can run user scripts for decision wheter links from current HTML document
 should be downloaded
-optionaly can generate statistical reports from download, usable for
 WEB site link checking
-supports different FTP directory listing formats (SYSV, BSD, EPFL, NOVEL,
 VMS, DOS/WINDOWS)
-multithreading support
-supports multiple HTTP proxies with round robin scheduling
-have simple support for javascript using URL patterns
-supports persistent HTTP/1.0 proxy connections
-have simple JavaScript bindings to allow much more flexible conditions
 for excluding URLs from transfer
....?

License :
Everything is licensed under GPL. See COPYING and COPYING.LIB.

Ported to:

-Linux (x86,ppc) (gcc)
	Single threaded: supported, works well
	Multi threaded: supported, works well with glibc2&LinuxThreads,
			glibc1&LinuxThreads and glibc1&pcthreads
	GUI: Gtk+

-Digital Unix 3.2 (alpha) (cc and gcc), 4.0 (alpha) (cc and egcs)
	Single threaded: supported, works well
	Multi threaded: supported, works well
	GUI: Gtk+

-Ultrix 4.4 (mips) (cc,egcs) (need to use bash instead of default sh to 
                              run configure script)
	Single threaded: supported, works well
	Multi threaded: not supported, I don't know about any POSIX threads
			implementation for Ultrix
	GUI: Gtk+

-NetBSD (sparc, mips) (gcc)
	latest versions not tested

-NetBSD-1.4.2 (x86) (gcc)
	Single threaded: supported, works well
	Multi threaded: not supported, I don't know about any POSIX threads
			implementation for this version of NetBSD
	GUI: Gtk+ (not tested)

-OpenBSD-2.7 (x86) (gcc)
	Single threaded: supported, works well
	Multi threaded: supported, works well (not much tested)
	GUI: Gtk+ (not tested)

-FreeBSD-4.0 (x86) (gcc)
	Single threaded: supported, works well
	Multi threaded: works, sometimes I get kernel error
			"microuptime() went backwards"
	GUI: Gtk+ (need to pass option --with-gtk-config=gtk12-config to
	     configure script)

-WIN32 (x86) (gcc + cygwin POSIX emulation layer)
	Single threaded: supported, works well
	Multi threaded: doesn't work, because POSIX threads implementation is
			not finished yet
	GUI: win32 Gtk+ for cygwin

-Solaris 7 (x86) (gcc)
	Single threaded: supported, works well
	Multi threaded: supported, works well with POSIX threads
	GUI: Gtk+

-QNX RtP (x86) (gcc)
	Single threaded: supported, works well
	Multi threaded: supported, works well with libc POSIX threads
	GUI: Gtk+ in XPhoton

-BeOS 5 PE (x86) (gcc)
	Single threaded: supported, works except document locking
	Multi threaded: not supported, missing POSIX threads
	GUI: not supported

-also several other UNIX-es, Mac OS X server, OS/2 reported to work

Author :
Ondrejicka Stefan

Home page:
http://pavuk.sourceforge.net/

If you find any bugs please send me report and or fix. 
If you have any suggestions, ideas or questions please contact developers.