File: README.txt

package info (click to toggle)
unity-java 1.1%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 1,760 kB
  • sloc: ansic: 7,600; java: 6,850; sh: 349; yacc: 263; makefile: 197
file content (384 lines) | stat: -rw-r--r-- 15,602 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
Unity: multi-syntax unit parsing
================================

This is the unity library, which is able to parse scientific unit
specifications using a variety of syntaxes.

  * Version: 1.1
  * Repo:    https://heptapod.host/nxg/unity
  * Release: 1.1 (dc4b67299583, 2023 October 22)

The code is made available under the terms of the 2-clause BSD licence.
See the file LICENCE.txt, in the distribution, for the copyright
statement and the terms.

The project's canonical ‘home’ URI is <https://purl.org/nxg/dist/unity>.
That URI redirects elsewhere, but **you should quote or bookmark this
`purl.org` URI, rather than the one it redirects to**, as the permanent
name of the library (I have relocated the repository more than once).
That home page contains further downloads and formatted documentation.

The most recent version of the library release has DOI
[10.5281/zenodo.6949817](https://doi.org/10.5281/zenodo.6949817)
(that DOI always points to the most recent version; specific releases have distinct DOIs).

There is an issue tracker at [heptapod](https://heptapod.host/nxg/unity).

Note that, somewhat unfortunately, the namespace URIs for the Unity
subjects are `http://bitbucket.org/nxg/unity/ns/unit#`,
`http://bitbucket.org/nxg/unity/ns/syntax#` and
`http://bitbucket.org/nxg/unity/ns/schema#`
(the library was at one time hosted at bitbucket,
but the repository has been migrated away).
Although these are of course somewhat arbitrary,
it is undesirable that they are still at bitbucket.
It would be good to change these, but that probably can't happen
before some future version of the software
(this is [issue #1](https://heptapod.host/nxg/unity/-/issues/1)).


Goals
-----

The library was written with the following goals:

  * Producing formal grammars of the existing and proposed standard unit
    syntaxes, for reference purposes.

  * Participating in the [VOUnit standardisation process][vounits]
    by acting as a locus for experimentation with syntaxes and
    proposed standards.

  * To that end, discovering edge cases and producing test cases.

  * Producing parsing libraries which are fast and standalone, and so
    could be conveniently used by other software; this also acts as an
    implementation of the VOUnits standard.  The distribution
    is buildable using no extra software beyond a Java and a C
    compiler.  It is also a goal that the distributed Java is
    source-compatible with Java 1.5 (though this isn't automatically
    tested, so bugreports are welcome).

It is not a goal for this library to do any processing of the
resulting units, such as unit conversion or arithmetic.  Parsing unit
strings isn't a deep or particularly interesting problem, but it's
more fiddly than one might expect, and so it's useful for it to be
done properly once.

A major goal of the VOUnits process is to identify a syntax for unit
strings (called 'VOUnits') which is as nearly as possible in the
intersection of the various existing standards.  The intention is that
if file creators target the VOUnits syntax, the resulting string has
a chance of being readable by as many other parsers as possible.  This
isn't completely possible (the OGIP syntax doesn't allow dots as
multipliers), but we can get close.
Version 1.0 of the [VOUnits standard][vounits] was approved on 2014 May 23.

Although the library was produced as part of the VOUnits process,
its use is not restricted to that syntax, and it should be useful for each of
the syntaxes listed below.

Outputs
-------

**Yacc grammars for the three well-known syntaxes, plus a proposed 'VOUnits'
grammar**  These are consistent, in the sense that any string
which parses in more than one of these grammars means the same thing in each
case (ignoring questions of per-syntax valid units).  The 'VOUnits' syntax
is almost in the intersection of the three, in the sense that anything which
conforms to that grammar will parse (and mean the same thing) in the others.
The only exception is that the OGIP grammar uses '`*`' for multiplication, and the
others accept '`.`', but that could be got around with a fairly simple character
substitution.  That is, if one writes out in that syntax, then it can be read in
almost anything.

**Parsers in multiple languages** This distribution includes parsers in Java and
C.  Thus the core content here is demonstrably language-agnostic.  Python would
be an obvious next language.

**Test cases** there are nearly 300 test cases, consisting of unit
strings which either should or should not parse, plus a smaller number
of functional tests.

**A collection of 'known' units** There are multiple collections of
these in circulation in different libraries, but this library gathers
them together and generates per-language lookups of the information.
The [VOUnits document][vounits] includes discussion of the various
compromises necessary here.

Parsing units and prefixes
--------------------------

The grammars defined by this library do not cover the parsing of unit prefixes
since (as it turns out) this cannot be usefully done at this level, and the
grammars identify, in the terminal `STRING`, only the combination of
prefix+unit.  These are subsequently parsed in the following manner:

  1. if the whole string is a 'known unit' then this is the base unit (so
     'pixel' is recognised in some of the syntaxes as a unit, and a 'Pa' is a
     pascal and not a peta-year);

  2. or if the first character in the `STRING` is one of the SI prefixes (or the
     first two are 'da') and there is more than one (two) character, then that's
     a prefix and the rest of the string is the unit (so 'pixe' would be parsed as
     a pico-ixe);

  3. or else the whole thing is a unit (so 'wibble' is an unknown unit called
     the 'wibble', 'm' is the metre and not a milli-nothing, but 'furlong' would
     be a femto-urlong).

That is, validity checking – checking whether this is an allowed unit,
or whether it's allowed to have an SI prefix – happens at a later stage
from the parsing of the units string and only on request, since it's
essentially an auxiliary parse.  This (a) avoids the cumbersomeness of
doing this check earlier, (b) separates the _grammatical_ error of having a star
in the wrong place from the stylistic or semantic error of using an
inappropriate unit, and (c) retains the freedom to use odd
units if someone really wants to.

The library also recognises the binary prefixes (kibi, mebi, and so on) of ISO/IEC 80000-13.

Summary:

  * `pixel` --> 'pixel' in the FITS and OGIP syntaxes, the pico-ixel in CDS
  * `furlong/pixe` --> femto-urlong per pico-ixe
  * `m` --> metre in all syntaxes
  * `mm` --> millimetre
  * `dam` --> dekametre (not the deci-`am`)

Notes
=====

The recognised syntaxes are:

* **fits**:
    FITS, 3.0 Sect.4.3 ([W.D. Pence et al., A&A 524, A42, 2010][fits]);
    v4.0 Sect.4.3 ([FITS standards page][fitsspec]);
    and further comments in the [FITS WCS IV paper][fitswcsiv].

* **ogip**:
    [OGIP memo OGIP/93-001, 1993][ogip].

* **cds**:
    [Standards for Astronomical Catalogues, Version 2.0, section 3.2, 2000][cds].

* **vounits**:
    The VOUnits syntax;
    this is specified by the [VOUnits specification][vounits].
    This is derived from the FITS syntax, with some rationalisations
    (for example discarding `m(2)` as an alternative for `m**2`) and
    some additions (for example including quoted strings such as
    `'jupiterMass'` to explicitly mark ‘unknown’ units).

The grammars are available in `src/grammar/unity.y`.  Note that this
file is pre-processed before it is fed into a parser generator, and
isn't a valid yacc file as it stands; see the relevant targets in
`src/java` and `src/c`.  Summary versions of the four syntaxes, as
yacc/bison grammars, are included in the specification document, and in
the distribution.

The grammars are implemented by (at present) two libraries, one in C
and one in Java.  Each of these generates its parsers directly from
the grammars.  See `src/c/docs` and `src/java/docs` for documentation.

The Java implementation has, and will probably continue to have, more
functionality than the C one.  That said, I'm open to suggestions
about features that are currently in the Java version that could
usefully be ported to the C version.

Each of the implementations supports reading and writing each of the
grammars, plus LaTeX output
(supported by the LaTeX `siunitx` package).

The main testcases – a set of unit strings and the intended parse
results – are in `src/grammar/testcases*.csv`.  There are also
library-specific unit tests within the source trees.

Examples
--------

There is example code in directory `examples/`.

If you want to experiment with the library, there are a couple of
command-line tools to do so.  Build `src/java/unity.jar`
or `src/c/unity`.  Below, we illustrate behaviour with the Java version.

Basic use: parse with the CDS syntax and output with the OGIP one:

    % java -jar unity.jar -icds -oogip mm2/s
    mm**(2) /s

Now CDS to FITS, with verbose/validation output:

    % java -jar unity.jar -icds -ofits -v mm2/s
    mm2 s-1
    Checking <mm2/s>, in input syntax cds:
    check: all units recognised?        yes
    check: all units recommended?       yes
    check: all constraints satisfied?   yes
    Result:
      mm^2.0               (10^-3 Metre)^2.0
        Metre(http://qudt.org/vocab/quantity#Length) / m
      s^-1.0               (Second)^-1.0
        Second(http://qudt.org/vocab/quantity#Time) / s

FITS to CDS syntax.  Note that the erg is a recognised unit in FITS.

    % java -jar unity.jar -ifits -ocds -v merg/s
    merg/s
    Checking <merg/s>, in input syntax fits:
    check: all units recognised?        yes
    check: all units recommended?       no
    check: all constraints satisfied?   no
    Result:
      merg                 (10^-3 Erg)
        Erg(http://qudt.org/vocab/quantity#EnergyAndWork) / erg
      s^-1.0               (Second)^-1.0
        Second(http://qudt.org/vocab/quantity#Time) / s

The same thing, but CDS to FITS.  The erg is _not_ a recognised CDS unit.

    java -jar unity.jar -icds -ofits -v merg/s
    merg s-1
    Checking <merg/s>, in input syntax cds:
    check: all units recognised?        no
    check: all units recommended?       no
    check: all constraints satisfied?   yes
    Result:
      merg                 (10^-3 erg)
        (unrecognised)
      s^-1.0               (Second)^-1.0
        Second(http://qudt.org/vocab/quantity#Time) / s

If we add the `-g` (guess) option, however, then the library is
willing to guess that the `erg` is the Erg, but it's still not a
recommended unit.

    % java -jar unity.jar -icds -ofits -v -g merg/s
    merg s-1
    Checking <merg/s>, in input syntax cds:
    check: all units recognised?        no
           ...with guessing?            no
    check: all units recommended?       no
    check: all constraints satisfied?   yes
    Result:
      merg                 (10^-3 Erg), guessed
        Erg(http://qudt.org/vocab/quantity#EnergyAndWork) / erg
      s^-1.0               (Second)^-1.0
        Second(http://qudt.org/vocab/quantity#Time) / s

Similarly, if we parse `mm/degree`, then the library fails to
recognise `degree` (since the preferred unit string is `deg`), but
will recognise it if guessing is turned on.

    % java -jar unity.jar -icds -ofits -v -g mm/degree
    mm deg-1
    Checking <mm/degree>, in input syntax cds:
    check: all units recognised?        no
           ...with guessing?            yes
    check: all units recommended?       yes
    check: all constraints satisfied?   yes
    Result:
      mm                   (10^-3 Metre)
        Metre(http://qudt.org/vocab/quantity#Length) / m
      deg^-1.0             (Degree Angle)^-1.0, guessed
        Degree Angle(http://qudt.org/vocab/quantity#PlaneAngle) / deg

In the latter cases, the -v option _validates_ the input string
against various constraints.  The expression `mm/s` is completely valid
in all the syntaxes.  In the FITS syntax, the erg is a recognised
unit, but it is deprecated; although it is recognised, it is not
permitted to have SI prefixes.  In the CDS syntax, the erg is neither
recognised nor (a fortiori) recommended; since there are no
constraints on it in this syntax, it satisfies all of them (this
latter behaviour is admittedly slightly counterintuitive).

Then we ask the library to ‘guess’ what an `erg` is,
since it is not recognised in the CDS syntax.  It correctly guesses an
Erg, but although this parsed propertly as an Erg (as opposed to the
unknown unit `erg`), it is still not a recognised unit.  The guessing
process can do a little more, and for example can recognise `degrees`
as Degrees (instead of preferred `deg`, and as opposed to deci-egrees),
and interpret improperly pluralised `ergs` as the Erg as well.

The C library can also validate unit strings in the same way, but it
does not have this guessing functionality.  It has a corresponding program `unity`.

The library of ‘known units’ draws on v1.1 of the excellent
[QUDT](http://qudt.org) units ontology.  See `src/qudt` in the repository.


Portability
-----------

The library is developed on macOS, and sporadically tested on Debian
and FreeBSD.  However there isn't any CI set up, so it's not
systematically tested on a variety of platforms.


Building
---------

The usual:

    % ./configure
    % make
    % make check
    % make install

The build process requires GNU make (as opposed to BSD make).

Pre-requirements: distribution tarball
--------------------------------------

**No library dependencies.**  To build from a distribution, the only
pre-requirements are a C compiler and a JDK (JDK 8 or later).  You can
build either or both of the C and Java libraries, at your option
(eg, `cd src/c; make check`)

If the [JUnit jar](https://junit.org) is in the CLASSPATH,
then `make check` will run more tests than if it's absent.


Pre-requirements: repository checkout
-------------------------------------

See the file [README-developer.md](README-developer.md), for instructions on
building from a repository checkout.  These instructions are
unfortunately a little intricate, and shouldn't be necessary for most
users.


Limitations
-----------

  * Currently ignores some of the odder unit restrictions (such as the
    OGIP requirement that 'Crab' can have a 'milli' prefix, but no
    other SI prefixes).
  * Serialising parsed units (for example using the `-o` option of the
    command-line tools) will generally produce a valid string in the
    target syntax, but isn't guaranteed to do so.  For example,
    `log(1e3W)` is a valid VOUnits string, but it doesn't have a
    corresponding valid string in (our interpretation of) the FITS
    syntax.  The library does the best it can.

Hacking
-------

The distributed source set is assembled using quite a lot of
preprocessing, involving parser- and documentation-generators.  It's
not intended to be a useful starting point for hacking on the
software.  For that, see the [instructions on building from a
repository checkout](README-developer.md).


[vounits]:	https://www.ivoa.net/documents/VOUnits/
[cds]:		https://vizier.u-strasbg.fr/vizier/doc/catstd-3.2.htx
[fits]:		https://doi.org/10.1051/0004-6361/201015362
[fitsspec]:	https://fits.gsfc.nasa.gov/fits_standard.html
[fitswcsiv]:	https://doi.org/10.1051/0004-6361/201424653
[ogip]: 	https://heasarc.gsfc.nasa.gov/docs/heasarc/ofwg/docs/general/ogip_93_001/
[dist]:		https://purl.org/nxg/dist/unity

[Norman Gray](https://nxg.me.uk)  
2023 October 22