File: notes.tex

package info (click to toggle)
abcl 1.9.2-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 12,064 kB
  • sloc: lisp: 63,756; java: 63,092; xml: 4,300; sh: 409; makefile: 25; awk: 3
file content (493 lines) | stat: -rw-r--r-- 13,441 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
\begin{verbatim}
JARs and JAR entries in ABCL
============================

    Mark Evenson
    Created:  09 JAN 2010
    Modified: 21 JUN 2011

Notes towards an implementation of "jar:" references to be contained
in Common Lisp `PATHNAME`s within ABCL.

Goals
-----

1.  Use Common Lisp pathnames to refer to entries in a jar file.
    
2.  Use `'jar:'` schema as documented in [`java.net.JarURLConnection`][jarURLConnection] for
    namestring representation.

    An entry in a JAR file:

         #p"jar:file:baz.jar!/foo"
    
    A JAR file:

         #p"jar:file:baz.jar!/"

    A JAR file accessible via URL

         #p"jar:http://example.org/abcl.jar!/"

    An entry in a ABCL FASL in a URL accessible JAR file

         #p"jar:jar:http://example.org/abcl.jar!/foo.abcl!/foo-1.cls"
         
[jarUrlConnection]: http://java.sun.com/javase/6/docs/api/java/net/JarURLConnection.html

3.  `MERGE-PATHNAMES` working for jar entries in the following use cases:

        (merge-pathnames "foo-1.cls" "jar:jar:file:baz.jar!/foo.abcl!/foo._")
        ==> "jar:jar:file:baz.jar!/foo.abcl!/foo-1.cls"

        (merge-pathnames "foo-1.cls" "jar:file:foo.abcl!/")
        ==> "jar:file:foo.abcl!/foo-1.cls"

4.  TRUENAME and PROBE-FILE working with "jar:" with TRUENAME
    cannonicalizing the JAR reference.

5.  DIRECTORY working within JAR files (and within JAR in JAR).

6.  References "jar:<URL>" for all strings <URL> that java.net.URL can
    resolve works.

7.  Make jar pathnames work as a valid argument for OPEN with
:DIRECTION :INPUT.

8.  Enable the loading of ASDF systems packaged within jar files.

9.  Enable the matching of jar pathnames with PATHNAME-MATCH-P

        (pathname-match-p 
          "jar:file:/a/b/some.jar!/a/system/def.asd"
          "jar:file:/**/*.jar!/**/*.asd")      
        ==> t

Status
------

All the above goals have been implemented and tested.


Implementation
--------------

A PATHNAME refering to a file within a JAR is known as a JAR PATHNAME.
It can either refer to the entire JAR file or an entry within the JAR
file.

A JAR PATHNAME always has a DEVICE which is a proper list.  This
distinguishes it from other uses of Pathname.

The DEVICE of a JAR PATHNAME will be a list with either one or two
elements.  The first element of the JAR PATHNAME can be either a
PATHNAME representing a JAR on the filesystem, or a URL PATHNAME.

A PATHNAME occuring in the list in the DEVICE of a JAR PATHNAME is
known as a DEVICE PATHNAME.

Only the first entry in the the DEVICE list may be a URL PATHNAME.

Otherwise the the DEVICE PATHAME denotes the PATHNAME of the JAR file.

The DEVICE PATHNAME list of enclosing JARs runs from outermost to
innermost.  The implementaion currently limits this list to have at
most two elements.
    
The DIRECTORY component of a JAR PATHNAME should be a list starting
with the :ABSOLUTE keyword.  Even though hierarchial entries in jar
files are stored in the form "foo/bar/a.lisp" not "/foo/bar/a.lisp",
the meaning of DIRECTORY component is better represented as an
absolute path.

A jar Pathname has type JAR-PATHNAME, derived from PATHNAME.


BNF
---

An incomplete BNF of the syntax of JAR PATHNAME would be:

      JAR-PATHNAME ::= "jar:" URL "!/" [ ENTRY ]

      URL ::= <URL parsable via java.net.URL.URL()>
            | JAR-FILE-PATHNAME

      JAR-FILE-PATHNAME ::= "jar:" "file:" JAR-NAMESTRING "!/" [ ENTRY ]

      JAR-NAMESTRING  ::=  ABSOLUTE-FILE-NAMESTRING
                         | RELATIVE-FILE-NAMESTRING

      ENTRY ::= [ DIRECTORY "/"]* FILE


### Notes

1.  `ABSOLUTE-FILE-NAMESTRING` and `RELATIVE-FILE-NAMESTRING` can use
the local filesystem conventions, meaning that on Windows this could
contain '\' as the directory separator, which are always normalized to
'/'.  An `ENTRY` always uses '/' to separate directories within the
jar archive.


Use Cases
---------

    // UC1 -- JAR
    pathname: {
      namestring: "jar:file:foo/baz.jar!/"
      device: ( 
        pathname: {  
          device: "jar:file:"
          directory: (:RELATIVE "foo")
          name: "baz"
          type: "jar"
        }
      )
    }


    // UC2 -- JAR entry 
    pathname: {
      namestring: "jar:file:baz.jar!/foo.abcl"
      device: ( pathname: {
        device: "jar:file:"
        name: "baz"
        type: "jar"
      }) 
      name: "foo"
      type: "abcl"
    }


    // UC3 -- JAR file in a JAR entry
    pathname: {
      namestring: "jar:jar:file:baz.jar!/foo.abcl!/"
      device: ( 
        pathname: {
          name: "baz"
          type: "jar"
        }
        pathname: {
          name: "foo"
          type: "abcl"
        } 
      )
    }

    // UC4 -- JAR entry in a JAR entry with directories
    pathname: {
      namestring: "jar:jar:file:a/baz.jar!/b/c/foo.abcl!/this/that/foo-20.cls"
      device: ( 
        pathname {
          directory: (:RELATIVE "a")      
          name: "bar"
          type: "jar"
        }
        pathname {
          directory: (:RELATIVE "b" "c")
          name: "foo"
          type: "abcl"
        }
      )
      directory: (:RELATIVE "this" "that")
      name: "foo-20"
      type: "cls" 
    }

    // UC5 -- JAR Entry in a JAR Entry
    pathname: {
      namestring: "jar:jar:file:a/foo/baz.jar!/c/d/foo.abcl!/a/b/bar-1.cls"
      device: (
        pathname: {
          directory: (:RELATIVE "a" "foo")
          name: "baz"
          type: "jar"
        }
        pathname: {
          directory: (:RELATIVE "c" "d")
          name: "foo"
          type: "abcl"
        }
      )
      directory: (:ABSOLUTE "a" "b")
      name: "bar-1"
      type: "cls"
    }

    // UC6 -- JAR entry in a http: accessible JAR file
    pathname: {
      namestring: "jar:http://example.org/abcl.jar!/org/armedbear/lisp/Version.class",
      device: ( 
        pathname: {
          namestring: "http://example.org/abcl.jar"
        }
        pathname: {
          directory: (:RELATIVE "org" "armedbear" "lisp")
          name: "Version"
          type: "class"
       }
    }

    // UC7 -- JAR Entry in a JAR Entry in a URL accessible JAR FILE
    pathname: {
       namestring  "jar:jar:http://example.org/abcl.jar!/foo.abcl!/foo-1.cls"
       device: (
         pathname: {
           namestring: "http://example.org/abcl.jar"
         }
         pathname: { 
           name: "foo"
           type: "abcl"
         }
      )
      name: "foo-1"
      type: "cls"
    }

    // UC8 -- JAR in an absolute directory

    pathame: {
       namestring: "jar:file:/a/b/foo.jar!/"
       device: (
         pathname: {
           directory: (:ABSOLUTE "a" "b")
           name: "foo"
           type: "jar"
         }
       )
    }

    // UC9 -- JAR in an relative directory with entry
    pathname: {
       namestring: "jar:file:a/b/foo.jar!/c/d/foo.lisp"
       device: (
         directory: (:RELATIVE "a" "b")
         name: "foo"
         type: "jar"
       )
       directory: (:ABSOLUTE "c" "d")
       name: "foo"
       type: "lisp
    }


URI Encoding 
------------

As a subtype of URL-PATHNAMES, JAR-PATHNAMES follow all the rules for
that type.  Most notably this means that all #\Space characters should
be encoded as '%20' when dealing with jar entries.


History
-------

Previously, ABCL did have some support for jar pathnames. This support
used the convention that the if the device field was itself a
pathname, the device pathname contained the location of the jar.

In the analysis of the desire to treat jar pathnames as valid
locations for `LOAD`, we determined that we needed a "double" pathname
so we could refer to the components of a packed FASL in jar.  At first
we thought we could support such a syntax by having the device
pathname's device refer to the inner jar.  But with in this use of
`PATHNAME`s linked by the `DEVICE` field, we found the problem that UNC
path support uses the `DEVICE` field so JARs located on UNC mounts can't
be referenced. via '\\', i.e.  

    jar:jar:file:\\server\share\a\b\foo.jar!/this\that!/foo.java

would not have a valid representation.

So instead of having `DEVICE` point to a `PATHNAME`, we decided that the
`DEVICE` shall be a list of `PATHNAME`, so we would have:

    pathname: {
      namestring: "jar:jar:file:\\server\share\foo.jar!/foo.abcl!/"
      device: ( 
                pathname: {
                  host: "server"
                  device: "share"
                  name: "foo"
                  type: "jar"
                }
                pathname: {
                  name: "foo"
                  type: "abcl"
                }
              )
    }

Although there is a fair amount of special logic inside `Pathname.java`
itself in the resulting implementation, the logic in `Load.java` seems
to have been considerably simplified.

When we implemented URL Pathnames, the special syntax for URL as an
abstract string in the first position of the device list was naturally
replaced with a URL pathname.

\end{verbatim}
\begin{verbatim}



URL Pathnames ABCL
==================

    Mark Evenson
    Created:  25 MAR 2010
    Modified: 21 JUN 2011

Notes towards an implementation of URL references to be contained in
Common Lisp `PATHNAME` objects within ABCL.


References
----------

RFC3986   Uniform Resource Identifier (URI): Generic Syntax


URL vs URI
----------

We use the term URL as shorthand in describing the URL Pathnames, even
though the corresponding encoding is more akin to a URI as described
in RFC3986.  


Goals
-----

1.  Use Common Lisp pathnames to refer to representations referenced
by a URL.

2.  The URL schemes supported shall include at least "http", and those
enabled by the URLStreamHandler extension mechanism.

3.  Use URL schemes that are understood by the java.net.URL object.

    Example of a Pathname specified by URL:
    
        #p"http://example.org/org/armedbear/systems/pgp.asd"
    
4.  MERGE-PATHNAMES 

        (merge-pathnames "url.asd"
            "http://example/org/armedbear/systems/pgp.asd")
        ==> "http://example/org/armedbear/systems/url.asd"

5.  PROBE-FILE returning the state of URL accesibility.

6.  TRUENAME "aliased" to PROBE-FILE signalling an error if the URL is
not accessible (see "Non-goal 1").

7.  DIRECTORY works for non-wildcards.

8.  URL pathname work as a valid argument for OPEN with :DIRECTION :INPUT.

9.  Enable the loading of ASDF2 systems referenced by a URL pathname.

10.  Pathnames constructed with the "file" scheme
(i.e. #p"file:/this/file") need to be properly URI encoded according
to RFC3986 or otherwise will signal FILE-ERROR.  

11.  The "file" scheme will continue to be represented by an
"ordinary" Pathname.  Thus, after construction of a URL Pathname with
the "file" scheme, the namestring of the resulting PATHNAME will no
longer contain the "file:" prefix.

12.  The "jar" scheme will continue to be represented by a jar
Pathname.


Non-goals 
---------

1.  We will not implement canonicalization of URL schemas (such as
following "http" redirects).

2.  DIRECTORY will not work for URL pathnames containing wildcards.


Implementation
--------------

A PATHNAME refering to a resource referenced by a URL is known as a
URL PATHNAME.

A URL PATHNAME always has a HOST component which is a proper list.
This list will be an property list (plist).  The property list
values must be character strings.

    :SCHEME
        Scheme of URI ("http", "ftp", "bundle", etc.)
    :AUTHORITY   
        Valid authority according to the URI scheme.  For "http" this
        could be "example.org:8080".
    :QUERY
        The query of the URI
    :FRAGMENT
        The fragment portion of the URI
        
The DIRECTORY, NAME and TYPE fields of the PATHNAME are used to form
the URI `path` according to the conventions of the UNIX filesystem
(i.e. '/' is the directory separator).  In a sense the HOST contains
the base URL, to which the `path` is a relative URL (although this
abstraction is violated somwhat by the storing of the QUERY and
FRAGMENT portions of the URI in the HOST component).

For the purposes of PATHNAME-MATCH-P, two URL pathnames may be said to
match if their HOST compoments are EQUAL, and all other components are
considered to match according to the existing rules for Pathnames.

A URL pathname must have a DEVICE whose value is NIL.

Upon creation, the presence of ".." and "." components in the
DIRECTORY are removed.  The DIRECTORY component, if present, is always
absolute.

The namestring of a URL pathname shall be formed by the usual
conventions of a URL.

A URL Pathname has type URL-PATHNAME, derived from PATHNAME.


URI Encoding 
------------

For dealing with URI Encoding (also known as [Percent Encoding]() we
adopt the following rules

[Percent Encoding]: http://en.wikipedia.org/wiki/Percent-encoding

1.  All pathname components are represented "as is" without escaping.

2.  Namestrings are suitably escaped if the Pathname is a URL-PATHNAME
    or a JAR-PATHNAME.

3.  Namestrings should all "round-trip":

    (when (typep p 'pathname)
       (equal (namestring p)
              (namestring (pathname p))))


Status
------

This design has been implemented.


History
-------

26 NOV 2010 Changed implemenation to use URI encodings for the "file"
  schemes including those nested with the "jar" scheme by like
  aka. "jar:file:/location/of/some.jar!/".

21 JUN 2011 Fixed implementation to properly handle URI encodings
  refering nested jar archive.

\end{verbatim}