1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220
|
<HEAD>
<TITLE>EMIL version 2 TUTORIAL</TITLE>
</HEAD>
<BODY>
<H1>TUTORIAL FOR EMIL VERSION 2.1
</H1>
<EM>Written by Martin Wendel, ITS, Uppsala university.
Martin.Wendel@its.uu.se
</EM>
<HR>
<A HREF=comparison.html><IMG ALIGN=MIDDLE SRC=arrow_right3.gif></A><A HREF=main.html><IMG ALIGN=MIDDLE SRC=arrow_up2.gif></A><A HREF=why.html><IMG ALIGN=MIDDLE SRC=arrow_left3.gif></A>
<H2>HOW EMIL V2 DOES MESSAGE CONVERSION.
</H2>
<P>
To be able to understand message conversion it is important to understand
the components of a message and the types of these components. Basically
a message consists of a header and a body. The format of the header is
thoroughly defined in rfc822 and rfc1522. Basically it consists of lines
of text divided into two parts; the field and the value, separated by
a colon ':'. Apart from a few details the header is very straight forward.
The body, on the other hand, is somewhat more tricky to understand.
<P>
The body of a message consists of text or/and encoded data, where data
may refer to text as well. Emil recognizes the encodings BinHex, Base64,
Quoted-Printable and UUencode. Text is the parts of the message that is
not recognized as being an encoded enclosure. Text is a raw format whereas
encoded data follows the syntax of the encoding. <EM>This means that incorrect
or incomplete encodings are not recognized and are thus treated as raw text
(as a safety measure).</EM>
<P>
Emil works in a two pass manner. After loading the message, it first
interprets the header and body of the message before applying any
conversions. First the format is recognized. This is done by interprating
the header of the message. Then the data of the message is structured into
a hierarchical message structure.
<P>
UUencoded and BinHexed enclosures are recognized by their preamble and a
syntax check is made on them to make pretty sure they are valid. When emil
receives a MIME or Mailtool message, encoded enclosures are defined in the
header. Emil recognizes these definitions and trusts them to be valid, no
further processing is done initially. The rest of the body parts and the
erroneous BinHex and UUencode parts are treated as text. The text is checked
for 7bit or 8bit encoding.
<P>
Emil now has a hierarchical representation of the message and each part
is type marked by it's encoding. It may be that the trusted encodings of a MIME
or Mailtool message are erroneous as well. However, the definitions in the
header is a strong evidence that they must be correct. In the case of error,
in spite of all that,
the message part is left untouched, but not treated as text.
<P>
<EM>Conversion is applied as specified by the target format.</EM> Emil has a
clear view of the incoming message and can work pretty straight forward
parsing the hierarchical structure. Each object in the structure contains
headers, data, and pointers to other objects. When applying conversion, first
the data of each object is converted to what's specified in the target format,
then the headers of each object is taken care of based on the resulting type
and encoding of the data of the objects.
<P>
<B>An example:</B> The target format specifies Sun Mailtool, UUencoded attachments
and ISO-646-SE text. A MIME message containing a Quoted-Printable encoded
text/plain using ISO-8859-1 and a Base64 encoded Image/GIF will be converted
as follows (this is a fairly long and detailed description):
<OL>
<LI>The entire message is loaded into memory.
<LI>The header of the message is examined and loaded into a structure.
<DIR>
<LI>This yields format MIME and type Multipart/Mixed. The boundary
is saved.
</DIR>
<LI>The start of the body of the message is marked up at the end of
the header.
<LI>Find the end boundary, the end of the message is marked just
before the end boundary. The root object in the message structure
is completed.
<LI>Start off a child object.
<LI>We've got boundaries, go find the first boundary.
<LI>Examine the header of the first body part.
<DIR>
<LI> This yields type=Text/plain, charset=ISO-8859-1 and
encoding=Quoted-Printable.
</DIR>
<LI>Find the second boundary, terminating the first body part and
initiating the second. The first child object is completed.
<LI>Start off a sister object to the previous child object.
<LI>Examine the header of the second body part.
<DIR>
<LI> This yields type=Image/GIF, encoding=Base64.
</DIR>
<LI>There are no more boundaries (the end boundary was detected
in [3] and end of data as seen by the second bodypart is marked
just before the end boundary), terminating the second bodypart.
<LI>End of data is reached, message parsing is completed. We've got
a hierarchical message structure describing the incoming message.
<LI>Apply conversion.
<LI>Parse the message structure, converting the data.
<LI>First object (root object) is multipart, no data to be converted.
<LI>Second object (first child) is a text with charset=ISO-8859-1
and encoding quoted-printable.
<LI>Target format does not want quoted-printable and does not want
charset=ISO-8859-1. Thus, decode quoted-printable.
<DIR>
<LI> This yields an 8bit text with charset=ISO-8859-1.
</DIR>
<LI>Target format does not want charset=ISO-8859-1. Convert charset
to ISO-646-SE.
<DIR>
<LI> This yields a 7bit text with charset=ISO-646-SE.
</DIR>
<LI>Target format wants charset=ISO-646-SE, conversion on this data
is completed.
<LI>Third object (second child) is a GIF encoded in Base64.
<LI>Target format does not want Base64, decode Base64.
<DIR>
<LI> This yields a GIF with encoding=binary.
</DIR>
<LI>Target format does not want binary, encode UUencode.
<DIR>
<LI> This yields a GIF with encoding=UUencode.
</DIR>
<LI>Target format wants UUencode, conversion on this data is completed.
<LI>All data is now converted according to the target format. Start
converting headers.
<LI>All the MIME-specific headers in the root header are marked as such
in the first parse of the header. This includes
MIME-Version and Content-Type. These will not be part of the output.
<LI>The root object is a multipart, add header Content-Type
X-Sun-Encoding. Also add the Mailtool boundary to the boundary
string.
<LI>The second bodypart is a text. Add the Mailtool headers. Also
add header Content-Lines: number of lines.
<LI>The second bodypart is a GIF. Add the Mailtool headers. Also add
header Content-Lines: number of lines.
<LI>The message conversion is completed. Output the message.
<LI>Print the root header.
<LI>Print the Mailtool boundary.
<LI>Print the header of the first child.
<LI>Print the body of the first child.
<LI>Print the Mailtool boundary.
<LI>Print the header of the second child.
<LI>Print the body of the second child.
</OL>
<HR>
<A HREF=comparison.html><IMG ALIGN=MIDDLE SRC=arrow_right3.gif></A><A HREF=main.html><IMG ALIGN=MIDDLE SRC=arrow_up2.gif></A><A HREF=why.html><IMG ALIGN=MIDDLE SRC=arrow_left3.gif></A>
<hr size="4" noshade>
<ADDRESS>
<table WIDTH="95%">
<td>
March 1996<p>
<B>ITS Uppsala university</B><BR>
Box 887<BR>
751 08 Uppsala<BR>
SWEDEN<P>
</td>
<td ALIGN="right" VALIGN="middle">
<a href="mailto:Martin.Wendel@its.uu.se">Martin Wendel</a>
</td>
<td ALIGN="left" VALIGN="middle">
<a href="mailto:Martin.Wendel@its.uu.se">
<IMG border="0" SRC="binpobox.gif" ALT="E-Mail: "></a>
</td>
</table>
</ADDRESS>
</body>
</html>
|