File: tools.xml

package info (click to toggle)
yaird 0.0.12-18etch1
  • links: PTS
  • area: main
  • in suites: etch
  • size: 1,432 kB
  • ctags: 725
  • sloc: perl: 4,161; xml: 3,233; ansic: 3,105; sh: 876; makefile: 150
file content (316 lines) | stat: -rw-r--r-- 11,547 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
<section id="tools">
  <title>Tool Chain</title>
  <para>
    This section discusses which tools are used in implementing
    <application>yaird</application> and why.
  </para>

  <para>
    The application is built as a collection of perl modules.
    The use of a scripting language makes consistent error checking
    and building sane data structures a lot easier than shell
    scripting; using perl rather than python is mainly because in
    Debian perl has 'required' status while python is only 'standard'.
    The code follows some conventions:
  </para>

  <para>
    <itemizedlist>
      <listitem>
	<para>
	  Where there are multiple items of a kind, say fstab entries,
	  the perl module implements a class for individual items.
	  All classes share a common base class, <code>Obj</code>,
	  that handles constructor argument validation and that offers
	  a place to plug in debugging code.
	</para>
      </listitem>

      <listitem>
	<para>
	  Object attributes are used via accessor methods to catch
	  typos in attribute names.
	</para>
      </listitem>

      <listitem>
	<para>
	  Objects have a <code>string</code> method, that returns
	  a string version of the object.  Binary data is not
	  guaranteed to be absent from the string version.
	</para>
      </listitem>

      <listitem>
	<para>
	  Where there are multiple items of a kind, say fstab entries,
	  the collection is implemented as a module that is not a
	  class.  There is a function <code>all</code> that returns a
	  list of all known items, and functions <code>findByXxx</code>
	  to retrieve an item where the Xxx attribute has a given
	  value.  There is an <code>init</code> function that
	  initializes the collection; this is called automatically
	  upon first invocation of <code>all</code> or
	  <code>findByXxx</code>.
	  Collections may have convenience functions
	  <code>findXxxByYyy</code>: return attribute Xxx, given a
	  value for attribute Yyy.
	</para>
      </listitem>

    </itemizedlist>
  </para>

  <para>
    The generated initrd image needs a command interpreter;
    the choice of command interpreter is exclusively determined
    by the image generation template.
    At this point, both Debian and Fedora templates use the
    <application>dash</application> shell, for historical reasons only.
    Presumably <application>busybox</application> could be used to build a
    smaller image.  However, support for initramfs requires a complicated
    construction involving a combination of mount, chroot and chdir;
    to do that reliably, <application>nash</application> as used in Fedora
    seems a more attractive option.
  </para>

  <para>
    Documentation is in docbook format, since it's widely supported,
    supports numerous output formats, has better separation between
    content and layout than texinfo, and provides better guarantees
    against malformed HTML than texinfo.
  </para>

  <simplesect>
    <title>Autoconf</title>
    <para>
      GNU automake is used to build and install the application,
      where 'building' is perhaps too big a word adding the location
      of the underlying modules to the wrapper script.
      The reasons for using automake: it provides packagers with a
      well known mechanism for changing installation directories,
      and it makes it easy for developers to produce a cruft-free
      and reproducible tarball based on the tree extracted from
      version control.
    </para>
  </simplesect>

  <simplesect>
    <title>C Library</title>
    <para>
      The standard C library under linux is glibc.  This is big:
      1.2Mb, where an alternative implementation, klibc, is only 28Kb.
      The reason klibc can be so much smaller than glibc is that a
      lot of features of glibc, like NIS support, are not relevant for
      applications that need to do basic stuff like loading an IDE driver.
    </para>

    <para>
      There are other small libc implementations: in the embedded world,
      dietlibc and uClibc are popular.  However, klibc was specifically
      developed to support the initial image: it's intended to be included
      with the mainline kernel and allow moving a lot of startup magic out
      of the kernel into the initial image.  See 
      <ulink url="http://marc.theaimsgroup.com/?m=101070502919547">
	<citetitle>
	  LKML: [RFC] klibc requirements, round 2</citetitle></ulink>
      for requirements on klibc; the
      <ulink url="http://www.zytor.com/mailman/listinfo/klibc">
	mailing list</ulink> is the most current
      source of information.
    </para>

    <para>
      Recent versions of klibc (1.0 and later) include a wrapper around
      gcc, named klcc, that will compile a program with klibc.  This means
      <application>yaird</application> does not need to include klibc,
      but can easily be configured to use klibc rather than glibc.
      Of course this will only pay off if <emphasis>every</emphasis>
      executable on the initial image uses klibc.
    </para>

    <para>
      <application>Yaird</application> does not have to be extended in
      order to support klibc, but it is necessary to avoid assumptions
      about which shared libraries are used.  This is discussed in 
      <xref linkend="shlibs"/>.
    </para>
  </simplesect>

  <simplesect>
    <title>Template Processing</title>
    <para>
      This section discusses the templates used to transform
      high-level actions to lines of script in the generated image.
      These templates are intended to cope with small differences
      between distributions: a shell that is named
      <application>dash</application> in Debian and
      <application>ash</application> in Fedora for example.
      By processing the output of <application>yaird</application>
      through a template, we can confine the tuning of
      <application>yaird</application> for a specific distribution
      to the template, without having to touch the core code.
    </para>

    <para>
      One important function of a template library is to enforce
      a clear separation between progam logic and output formatting:
      there should be no way to put perl fragments inside a template.
      See <ulink url="http://www.stringtemplate.org/">StringTemplate</ulink>
      for a discussion of what is needed in a templating system, plus
      a Java implementation.
    </para>

    <para>
      Lets consider a number of possible templating solutions:
      <itemizedlist>

	<listitem>
	  <para>
	    <ulink url="http://www.template-toolkit.org/">
	     Template Toolkit</ulink>:
	    widely used, not in perl core distribution, does not
	    prevent mixing of code and templates.
	  </para>
	</listitem>

	<listitem>
	  <para>
	    <ulink url="http://search.cpan.org/dist/Text-Template/lib/Text/Template.pm">
	      Text::Template</ulink>:
	    not in perl core distribution, does not
	    prevent mixing of code and templates.
	  </para>
	</listitem>

	<listitem>
	  <para>
	    Some XSLT processor.  Not in core distribution,
	    more suitable for file-to-file transformations
	    than for expanding in-process data; overkill.
	  </para>
	</listitem>

	<listitem>
	  <para>
	    <ulink url="http://search.cpan.org/~samtregar/HTML-Template-2.7/Template.pm">
	      HTML-Template</ulink>:
	    not in perl core distribution,
	    prevents mixing of code and templates,
	    simple, no dependencies, dual GPL/Artistic license.
	    Available in Debian as
	    <application>libhtml-template-perl</application>,
	    in Fedora 2 as perl-HTML-Template, dropped from Fedora 3,
	    but available via
	    <ulink url="http://download.fedora.redhat.com/pub/fedora/linux/extras/">
	      Fedora Extras</ulink>.
	  </para>
	</listitem>

	<listitem>
	  <para>
	    A home grown templating system: a simple system such as the
	    HTML-Template module is over 100Kb.  We can cut down on that
	    by dropping functions we don't immediately need, but the effort
	    to get a tested and documented implementation remains substantial.
	  </para>
	</listitem>
	
      </itemizedlist>
    </para>

    <para>
      The HTML-Template approach is the best match for our
      requirements, so used in <application>yaird</application>.
    </para>

  </simplesect>

  <simplesect>
    <title>Configuration Parsing</title>

    <para>
      <application>Yaird</application> has a fair number of
      configuration items: templates containing a list of files and
      trees, named shell script fragments with a value that spans
      multiple lines.  If future versions of the application are going
      to be more flexible, the number of configuration items is only
      going to grow.  Somehow this information has to be passed to the
      application; an overview of the options.

      <itemizedlist>
	<listitem>
	  <para>
	    Configuration as part of the program.  Simply hard-code
	    all configuration choices, and structure the program so that
	    the configuration part is a well defined part of the
	    program.  The advantage is that there is no need for any
	    infrastructure, the disadvantage is that there is no clear
	    boundary where problems can be reported, and that it
	    requires the user to be familiar with the programming
	    language.
	  </para>
	</listitem>

	<listitem>
	  <para>
	    <ulink
	    url="http://search.cpan.org/~abw/AppConfig-1.56/lib/AppConfig.pm">AppConfig</ulink>.
	    A mature perl module that parses configuration files in a
	    format similar to Win32 "INI" files.  Widely used, stable,
	    flexible, well-documented, with as added bonus the fact that
	    it unifies options given on the command line and in the
	    configuration file.  An ideal solution, except for the fact
	    that we need a more complex configuration than can
	    conventiently be expressed in INI-file format.
	  </para>
	</listitem>

	<listitem>
	  <para>
	    An XML based configuration format.  XML parsers for perl are
	    readily available.  The advantage is that it's an industry
	    standard; the disadvantage that the markup can get very
	    verbose and that support for input validation is limited
	    (XML::LibXML mentions a binding for RelaxNG, but the code is
	    missing, and defining an input format in XML-Schema ... just
	    say no).
	  </para>
	</listitem>

	<listitem>
	  <para>
	    <ulink url="http://www.yaml.org/">YAML</ulink> is a data
	    serialisation format that is a lot more readable than XML.
	    The disadvantage is that it's not as widely known as XML,
	    that it's an indentation based language (so confusion over tabs
	    versus spaces can arise) and that support for input validation
	    is completely missing.
	  </para>
	</listitem>

	<listitem>
	  <para>
	    A custom made configuration language, based on
	    <ulink
	    url="http://search.cpan.org/dist/Parse-RecDescent/">Perl::RecDescent</ulink>,
	    a widely used, mature module to do recursive descent parsing
	    in perl.  Using a custom language means we can structure the
	    language to minimise opportunities for mistakes, can provide
	    relevant error messages, can support complex configuration
	    structures and can easily parse the configuration file to a tree
	    format that's suitable for further processing.  The disadvantage
	    is that a custom language is yet another syntax to learn.
	  </para>
	</listitem>

      </itemizedlist>
    </para>

    <para>
      Building a recursive descent parser seems the best match for this
      application.
    </para>

  </simplesect>
</section>