
|
DEJASEARCH Version 1.8.4
AUTHORS : Chew Wei Yih, Victor <vchew@post1.com>
Steffen Ullrich <coyote.frank@gmx.net>
Frank de Lange <frank@unternet.org>
J.R. Tietsort <jrtietsort@micron.com>
Dan Shiovitz <dans@drizzle.com>
CONTRIBUTORS: Andre Majorel <amajorel@teaser.fr>
URL : http://homemade.hypermart.net/dejasearch/
1. OVERVIEW
DejaSearch is a frontend to Deja.com (http://www.deja.com/), the leading
Usenet archive and search engine. Deja.com is a great resource to uncover
even the most obscure information from the sea of data that is Usenet. I
frequently use it to find answers to technical problems, or to learn about
how people rate particular products before making a more informed purchase.
DejaSearch will submit a search for you to Deja.com, then retrieve and
consolidate all search results into one single HTML file, sorted in
newsgroup, subject and date (reverse) order. This means related messages will
be closer to one another, with the more recent messages nearer to the top.
This also means you can print out the entire file and read the messages at
your leisure, instead of having to go through them one by one on the screen.
2. DEPENDENCIES
DejaSearch should be able to run on all Unix systems with a Perl interpreter
installed. At present, it has only been tested on Linux.
It has been reported that Dejasearch also runs with:
- ActivePerl (for Windoze)
- Apache (for Windoze)
3. USAGE
The usage pattern is of the form:
dejasearch [-proxy p] [-max m] [-output o] <search keywords>
The parameters for DejaSearch are explained in more detail below:
-proxy <URL> Proxy server in http://hostname:port format
(Alternatively, you can make use of the
"http_proxy" environment variable)
-max <num> Maximum number of messages to retrieve
-output <filename> Output HTML file (default: summary.html)
-type <type> Valid values are recent, old and all (default: all)
-format <type> Deja.com search results format.
Valid values are classic, new or mbox (default: new)
-fromdate <date> Date to limit search from (eg. Apr+1+1997)
-todate <date> Date to limit search to (eg. Apr+8+1997)
-[no]status Display download status. (default: yes)
-[no]verbose Display search status. (default: yes)
Note that -noverbose implies -nostatus.
-sleep <secs> Sleep given number of secs between each retrieval.
(default: 0)
Some examples:
dejasearch linux ip masquerading ppp
This will search for all messages containing "linux", "ip", "masquerading"
and "ppp" and output all messages to "summary.html".
dejasearch "linux [ip masquerading] ppp"
This will search for all messages containing "linux", "ip masquerading"
and "ppp" and output all messages to "summary.html". Since I don't know
of any method to pass double quotes to the shell (I use tcsh), I chose
'[' and ']' as an alias for specifying a phrase.
dejasearch "linux & (ip ^ masquerading) & ppp"
This will search for all messages containing "linux", "ip" and "masquerading"
close to one another and "ppp" and output all messages to "summary.html".
dejasearch -max 50 -output results.html linux ip masquerading ppp
This will search for up to 50 messages containing "linux", "ip", "masquerading"
and "ppp" and output all messages to "results.html".
4. DEJA.COM SEARCH LANGUAGE QUICK REFERENCE
Keywords can be separated by the following connectors:
& - AND e.g. beans & rice
| - OR camel | llama
&! - AND NOT clam &! chowder
^ - NEAR lucas ^ spielberg
Keywords can be combined with the following symbols:
"..." - Quote Marks "the far side"
* - Wildcard psych*
(...) - Parentheses scully & (xfiles | x-files)
{...} - Braces {monkey monkeying}
Keywords can be preceded by the following context operators:
~a - Author ~a demos@deja.com
~s - Subject ~s chess
~g - Newsgroup ~g alt.love
~dc - Creation date ~dc 1996/12/31
5. USING DEJASEARCH AS A CGI SCRIPT
First copy dejasearch into your cgi-bin directory with the proper access
permissions. Then access dejasearch with your browser! It's as simple as
that!
eg. http://my.site.com/cgi-bin/dejasearch
Some additional information below (mostly copied from Frank's email to me):
- On non-frames browsers it presents a frameless interface with a simple,
single-line search interface
- On frames-capable (actually, all browsers but those in the
non_frames_browsers array) it presents either a two- or three-paned
interface:
- If the user chooses 'Headers only' it uses the deja.com primary results
pages to filter out the subject, author, forum and date. These are then
presented in list format in the second frame. When the user clicks on one
of those links, the results are presented in the third frame. This way,
you never have to switch browser windows or use the back-button to see
more results.
- If the user chooses 'Show messages', the second frame contains the normal
dejasearch output.
The script also has some new options:
- new or classic interface (option -format new|classic)
Deja.com actually still offers the original, usable (in contrast to the
current bloated portal-monster) interface. You can reach this original
interface using the =dnc (dejanewsclassic?) modifier in the url. Have a
look at the %deja hash in the source for the url's and search patterns to
use...
The classic interface is much cleaner and faster, but it does not offer
the colored responses like the new interface. Even so, on a modem
connection the difference is quite noticeable (as in twice as fast). No
more than expected of course, since the new interface is so bandwidth-
hungry.
By the way, both classic and new interfaces still offer the 'Text
only' option using the fmt=text modifier. This way, you get the speediest
deja.com ever, but you loose the pre-formatted URL's etc. I think I will
add this as a third interface option though. I might even include my own
url formatting et al.
http://www.deja.com/=dnc/[ST_rn=qs]/getdoc.xp?AN=474042698&fmt=text
(classic interface)
http://x31.deja.com/[ST_rn=qs]/getdoc.xp?AN=474042698&fmt=text
(new interface)
These two URL's should give the same results.
The script does NOT need to be started from a static HTML
Note: If you are running your web server behind a firewall, make sure you
set the proper proxy settings for the web server account, or hardcode the
proxy settings into DejaSearch.
6. USING DEJASEARCH WITH LYNX' SIMULATED CGI SUPPORT
(This tip was contributed by Morten Bo Johansen <mojo@image.dk>)
This is a nice tip for Lynx users: If only you want to access a cgi script
occasionally to have it deliver its results to you locally, running a
webserver such as Apache is quite an overkill. For this purpose Lynx can be
configured to access cgi-scripts without any httpd deamon running in the
background. To enable this simply configure Lynx with cgi-links enabled..
./configure --enable-cgi-links
prior to building.
Once Lynx is compiled and installed place the dejasearch perl script
anywhere you like it, e.g. /usr/local/httpd/cgi-bin/dejasearch and access it
from the Lynx prompt with this line
lynxcgi:/usr/local/httpd/cgi-bin/dejasearch
The dejasearch search form will appear on your screen and you're ready to
go!
Note: This is tested with Lynx ver. 2.8.3dev8 and Dejasearch 1.65. There
were some problems with Lynx ver. 2.8.2 and Dejasearch ver. 1.64 in that it
seemed that the messages themselves could not be retrieved from an index.
The cause and effect in this is not investigated. You may also be able to
use Lynx 2.8.2 and Dejasearch ver. 1.65. for instance.
7. AUTO-LAUNCHING THE SEARCH RESULTS IN YOUR BROWSER
(This tip was contributed by Bill Goffe <Bill.Goffe@usm.edu>)
I just got a copy of dejasearch, and I like it quite a lot. I wrote the
following script (hardy worthy of the name) that puts the results in a
Netscape window. Maybe something like it could be part of dejasearch?
Output to just summary.html is kinda dull.
#!/usr/bin/perl
$ds_args = join(" ", @ARGV);
`dejasearch $ds_args -output /tmp/ds.html`;
`netscape -remote 'openURL(file:/tmp/ds.html)'`;
I would prefer not to integrate this into DejaSearch, since it is browser-
dependent. Therefore for those of you who needs this, you can refer to
Bill's script.
The above only works with Netscape browsers, and the browser must already be
opened. However, it should be easy to adapt this to work with other
browsers.
8. USING DEJASEARCH WITH AN AUTHENTICATING PROXY
If you are behind a proxy which requires authentication, edit dejasearch and
find the string "$auth". Then modify it to:
$auth = "username:password";
Alternatively, you can create a file called ".dejasearchrc" in your home
directory with the proxy authentication information in "user-name:password"
format (on a single line and without the quotes). The mode of this file must
be 400 or 600, or DejaSearch aborts with a non-zero status.
9. ACKNOWLEDGMENT
I would like to thank the GNU people. I don't know them personally, but they
have blessed us with free and great tools such as Linux, gcc, emacs, Perl,
fetchmail etc. which I now use on a daily basis. In the trails of their
selfless spirit, I will also like to share DejaSearch in the same way, and
hope many people besides me find it useful.
Thanks to all who provided valuable feedback/patches to make this program
better. Godspeed!
|