1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103
|
WWWOFFLE - World Wide Web Offline Explorer - Version 2.6
========================================================
This is the logic that the WWWOFFLE program follows when handling requests for
URLs that have a password in the header or in the URL itself.
Background Information
----------------------
1) When a browser first requests a page that is password protected a normal
request is sent without a password in it. This is obvious since there is no
way to decide in advance which pages have passwords.
2) When a server receives a request for page that requires authentication, but
for which there is none in the request, it sends back a '401 Unauthorized'
response. This contains a "realm" which defines the range of pages over
which this username/password pair is valid. A realm is not a well defined
range, it can be any set of pages on the same server, there is no requirement
for them to be related, although they normally are.
3) When a browser receives a '401' reply it will prompt the user for a username
and password if it does not already have one for the specified realm. If one
is already known then there is no need to prompt the user again.
4) The request that the browser sends back this time includes in the header the
username and password pair, but otherwise the same request as in (1).
5) The server now sends back the requested page.
6) Some browsers follow steps (1)-(5) for all pages on the server. Others try
to guess the range of pages that are covered by a realm, they then send the
username/password pair for all pages in the same directory for example. This
means that they follow steps (3)-(5) and miss out steps (1) and (2) for these
pages.
WWWOFFLE Implementation
-----------------------
1) If a password is specified in the request then it is handled as if it were in
the URL itself. This means that the spool file name is hashed in the same
way as normal, but it contains the username/password.
2) A page is always placed in the cache without a username/password for every
page that has a username/password. This ensures that when the page is later
requested while offline the version without the password can be sent to
prompt the browser. This is to solve the problem of browsers sending
username/password pairs for all pages, when the browser is closed and
restarted, a request for one of the pages (bookmarked perhaps) will not work
since the page without the username/password is not present so will be
requested for later fetching.
3) The mode of operation of the WWWOFFLE server is as follows:
URL = URL without password
URLpw = URL with password
WWWOFFLES mode - See README
WWWOFFLES | Password | URL | URLpw | Action to take
mode | provided? | cached? | cached? |
----------+-----------+---------+---------+-------------------------------------
Spool | No | No | n/a | Request URL (->F)
Spool | No | Yes | n/a | Spool URL
Spool | Yes | No | No | Request URLpw (->F)
Spool | Yes | No | Yes | Spool URLpw, Request URL (->F)
Spool | Yes | Yes | No | if(!401) Spool URL
Spool | Yes | Yes | No | if(401) Request URLpw (->F)
Spool | Yes | Yes | Yes | if(!401) Spool URL
Spool | Yes | Yes | Yes | if(401) Spool URLpw
----------+-----------+---------+---------+-------------------------------------
Fetch | No | n/a | n/a | Get URL
Fetch | Yes | No | n/a | Get URL, if(401) GET URLpw
Fetch | Yes | Yes | n/a | if(!401) Get URL
Fetch | Yes | Yes | n/a | if(401) Get URLpw
----------+-----------+---------+---------+-------------------------------------
Real | No | n/a | n/a | Get URL
Real | Yes | No | n/a | Get URL, if(401) Get URLpw
Real | Yes | Yes | n/a | if(!401) Get URL
Real | Yes | Yes | n/a | if(401) Get URLpw
----------+-----------+---------+---------+-------------------------------------
The other minor modes (SpoolOrReal, RealPragma etc.) act like the one that they
are based on.
4) When fetching recursively, a supplied username/password is used only on the
same server, but for all requests (fetch mode sorts out which need it).
5) When a username is supplied but no password (e.g. a FTP URL with the username
in the URL) then always return a page prompting for a password
6) When the configuration option try-without-password is false (it defaults to
true) this behaviour is modified. If a URL is requested with a password then
the existence or not of the same URL without a password is ignored. This
means that the behaviour is the same as a request for a page that does not
have a password, it is only based on the requested page itself.
Andrew M. Bishop
17th September 2000
|