XML/Unix Processing Tools Documentation

Usage

There are six tools. They are all simple filters, reading information from standard input in one format and writing the same information to standard output in a different format.

Tool nameInputOutput
xml2 XML Flat
html2 HTML Flat
csv2 CSV Flat
2xml Flat XML
2html Flat HTML
2csv Flat CSV

The ``Flat'' format is specific to these tools. It is a syntax for representing structured markup in a way that makes it easy to process with line-oriented tools. The same format is used for HTML, XML, and CSV; in fact, you can think of html2 as converting HTML to XHTML and running xml2 on the result; likewise 2html and 2xml.

CSV (comma-separated value) files are less expressive than XML or HTML (CSV has no hierarchy), so xml2 | 2csv is a lossy conversion.

File Format

To use these tools effectively, it's important to understand the ``Flat'' format. Unfortunately, I'm lazy and sloppy; rather than provide a precise definition of the relationship between XML and ``Flat'', I will simply give you a pile of examples and hope you can generalize correctly. (Good luck!)

XMLFlat equivalent
<thing/> /thing

<thing><subthing/></thing> /thing/subthing

<thing>stuff</thing> /thing=stuff

<thing>
<subthing>substuff</subthing>
stuff
</thing>
/thing/subthing=substuff
/thing=stuff

<person>
<name>Juan Doé</name>
<occupation>Zillionaire</occupation>
<pet>Dogcow</pet>
<address>
123 Camino Real
<city>El Dorado</city>
<state>AZ</state>
<zip>12345</zip>
</address>
<important/>
</person>
/person/name=Juan Doé
/person/occupation=Zillionaire
/person/pet=Dogcow
/person/address=123 Camino Real
/person/address/city=El Dorado
/person/address/state=AZ
/person/address/zip=12345
/person/important

<collection>
<group>
<thing>stuff</thing>
<thing>stuff</thing>
</group>
</collection>
/collection/group/thing=stuff
/collection/group/thing
/collection/group/thing=stuff

<collection>
<group>
<thing>stuff</thing>
</group>
<group>
<thing>stuff</thing>
</group>
</collection>
/collection/group/thing=stuff
/collection/group
/collection/group/thing=stuff

<thing>
stuff

more stuff
&lt;other stuff&gt;
</thing>
/thing=stuff
/thing=
/thing=more stuff
/thing=<other stuff>

<thing flag="value">stuff</thing> /thing/@flag=value
/thing=stuff

<?processing instruction?>
<thing/>
/?processing=instruction
/thing

(TO DO: Add equivalent examples for CSV files.)


XML/Unix Processing Tools