File: mirroring.html

package info (click to toggle)
rpm2html 1.7-2
  • links: PTS
  • area: main
  • in suites: woody
  • size: 1,808 kB
  • ctags: 935
  • sloc: ansic: 13,013; sh: 8,491; php: 338; makefile: 204; perl: 122
file content (179 lines) | stat: -rw-r--r-- 7,825 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>
Linux Packages Metadata Mirroring Proposal</title>
<meta name="GENERATOR" content="amaya V1.3">
</head>
<body bgcolor="#ffffff">

<h1 align=center>Linux Packages Metadata Mirroring Proposal</h1>
<p>
What does this title means ? Simply that there is currently a huge amount of
precompiled software freely available for Linux, but it's hard to find the one
one need:</p>
<ol>
<li>
It is difficult to locate the actual package needed to fullfill the needs of
the user (I want a graphic editor which support PNG format).
<li>
Once the program has been found, getting the binary version for a given system
setup (distribution version, architecture, etc ...) is usually hard.
<li>
Now getting the nearest binary, to minimize the download time again proves
difficult.
<li>
If the correct package has been found sometime it is impossible to install due
to missed dependancies, and the quest continues !
</ol>
<p>
I suggest here a mechanism to help finding the packages needed. It is based on
the propagation of Metadata (structured, machine readable informations) about
the available binary packages. It is based on the lessons learned by building
the <a href="http://rpmfind.net/linux/RPM/">RPM database on rpmfind.net</a>,
the <a href="http://rpmfind.net/linux/rpm2html/">rpm2html</a> program
development, the work done on <a href="http://www.imag.fr/Metadata/">Metadata
at W3C</a>, and discussions with <a href="mailto:jim@jimpick.com">Jim Pick</a>
(Debian) and <a href="mailto:marc@redhat.com">Marc Ewing</a> (RedHat).</p>
<p>
<img src="mirroring.gif" alt=" mirroring.gif "></p>
<p>
The picture above illustrate the four steps needed: to create, centralize,
propagate and expose packages metadata</p>

<h3>Extracting Metadata from binary packages</h3>
<p>
Basically the idea is to extract useful information about a package, like
application name, revision, author, dependancies, etc., and save them in a
format that is predefined and can be automatically parsed to extract the
informations. This is precisely <a
href="http://www.imag.fr/Metadata/">Metadata</a> (data about data) and I
suggest to use the <a href="http://www.imag.fr/TR/WD-rdf-syntax/">RDF</a>
Metadata encoding - based on <a href="http://www.imag.fr/XML/">XML</a> - the
metadata encoding proposed by <a href="http://www.imag.fr/">W3C</a>. Ideally
the description for the metadata should be independant of the package format,
however in practice it may be, for example the package dependancies are more
sophisticated in <a href="http://www.debian.org/">Debian</a> packages than in
<a href="http://www.rpm.org/">RPM</a> ones. Here is for example an RDF file
describing the RPM package "rpm2html-0.90-1.i386.rpm" :</p>
<pre>&lt;?XML version="1.0">
&lt;?namespace href ="http://www.imag.fr/TR/WD-rdf-syntax#/" AS = "RDF"?>
&lt;?namespace href ="http://www.rpm.org/" AS = "RPM"?>
&lt;RDF:RDF>
 &lt;RDF:Description RDF:HREF="ftp://ftp.redhat.com/pub/contrib/i386/rpm2html-0.90-1.i386.rpm">
  &lt;RPM:Name>rpm2html&lt;/RPM:Name>
  &lt;RPM:Version>0.90&lt;/RPM:Version>
  &lt;RPM:Release>1&lt;/RPM:Release>
  &lt;RPM:Distribution>Unknown&lt;/RPM:Distribution>
  &lt;RPM:Vendor>Daniel Veillard&lt;/RPM:Vendor>
  &lt;RPM:Size>13244&lt;/RPM:Size>
  &lt;RPM:URL>http://rpmfind.net/linux/rpm2html/&lt;/RPM:URL>
  &lt;RPM:BuildDate>Sun Mar 29 19:44:53 EST 1998&lt;/RPM:BuildDate>
  &lt;RPM:BuildHost>rpmfind.net&lt;/RPM:BuildHost>
  &lt;RPM:Group>X11/Applications&lt;/RPM:Group>
  &lt;RPM:Packager>Daniel Veillard&lt;/RPM:Packager>
  &lt;RPM:Summary>Translates rpm database into html info&lt;/RPM:Summary>
  &lt;RPM:Sources>ftp://ftp.redhat.com/pub/contrib/SRPM/rpm2html-0.90-1.src.rpm&lt;/RPM:Sources>
  &lt;RPM:Description>
  &lt;/RPM:Description>
Rpm2html tries to solve 2 big problems one faces when
grabbing a RPM package from a mirror on the net and trying to
install it:

   - it gives more information than just the filename before
     installing the package.
   - it tries to solve the dependency problem by analyzing all
     the Provides and Requires of the set of RPMs. It shows the
     cross references by way of hypertext links.
  &lt;RPM:Provides>
     &lt;RDF:Bag>
       &lt;RPM:Resource>rpm2html&lt;/RPM:Resource>
     &lt;/RDF:Bag>
  &lt;/RPM:Provides>
  &lt;RPM:Requires>
     &lt;RDF:Bag>
       &lt;RPM:Resource>libz.so.1&lt;/RPM:Resource>
       &lt;RPM:Resource>libdb.so.2&lt;/RPM:Resource>
       &lt;RPM:Resource>libc.so.6&lt;/RPM:Resource>
       &lt;RPM:Resource>ld-linux.so.2&lt;/RPM:Resource>
     &lt;/RDF:Bag>
  &lt;/RPM:Requires>
  &lt;RPM:Files>
/etc/rpm2html.config
/usr/bin
/usr/bin/rpm2html
/usr/doc/rpm2html-0.85
/usr/doc/rpm2html-0.85/CHANGES
/usr/doc/rpm2html-0.85/Copyright
/usr/doc/rpm2html-0.85/PRINCIPLES
/usr/doc/rpm2html-0.85/README
/usr/doc/rpm2html-0.85/TODO
/usr/doc/rpm2html-0.85/config.small
/usr/man/man1/rpm2html.1
/usr/share/rpm2html/msg.de
/usr/share/rpm2html/msg.es
/usr/share/rpm2html/msg.fr
  &lt;/RPM:Files>
 &lt;/RDF:Description>
&lt;/RDF:RDF>
</pre>
<p>
While this description is definitely not suitable for an human, it can be
parsed easily (basic RDF support is already present in Mozilla for example)
and numerous tools can take advantage of the medata to process the associated
data.</p>
<p>
As discussed and demonstrated quickly, these metadata can be easilly generated
from the packages themselve (has been done for both RPM and Debian packages),
and are usually quite smaller than the binary package themselves.</p>

<h3>Centralizing the Metadata</h3>
<p>
Since one cannot assume that one entity can actually generate the metadata for
all the available Linux binary packages available, some sort of distributed
work is needed, for example each maintainer of a Linux distribution or of a
set of packages can extract the metadata and make them available along with
the packages.</p>
<p>
The next step is to centralize these Metadata to build a database as complete
as possible, it can be done by mirroring the metadata provided by various
maintainers. The key point is that for each package the metadata has to be
generated once and uploaded once to the repository.</p>
<p>
Why centralizing ? Simply because metadata alone are not very useful, but the
cross references one can obtain by gathering and following multiple references
are usually far more useful. </p>
<p>
Moreover the bigger the database, the higher the probablility to answer a
request based on the associated data. Centralizing also ease the spreading of
data a lot !</p>

<h3>Propagating the Metadata</h3>
<p>
Once the Matadata have been gathered in a unique place, setting up a mirroring
scheme is esay and can be done in a very efficient way to propagate the data
near the final user, that's the basic mechanism set up for FTP mirrors, it's
well known and quite effective. The goal is to offer services based on these
Metadata and install them as close as possible from the final user.</p>

<h3> Expose the metadata</h3>
<p>
Once the Metadata are available a large amount of tools can be build to expose
and use their content. A basic idea is to build directories available for
searching and locating binary packages, for example rpm2html tool is being
modified to support RDF Metadata as input instead of the binary packages. This
allow the databases maintainer to point to a near mirror of the binary
packages. A lot of other tools can be build using the metadata, like automatic
checking of packages, smart installers following the metadata informations to
retrieve and install the latest packages and the correct dependancies, easier
management of clusters, etc.</p>
<address>
<p>
<a href="mailto:daniel@veillard.com">Daniel Veillard</a> </p>
</address>
<p>
$Id: mirroring.html,v 1.7 2001/07/17 22:50:09 veillard Exp $</p>

<h3></h3>
</body>
</html>