1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495
|
<pre>Network Working Group Annette L. DeSchon
Request for Comments: 971 ISI
January 1986
<span class="h1">A SURVEY OF DATA REPRESENTATION STANDARDS</span>
Status of This Memo
This RFC discusses data representation conventions in the
ARPA-Internet and suggests possible resolutions. No proposals in
this document are intended as standards for the ARPA-Internet at this
time. Rather, it is hoped that a general consensus will emerge as to
the appropriate approach to these issues, leading eventually to the
adoption of ARPA-Internet standards. Distribution of this memo is
unlimited.
<span class="h2"><a class="selflink" id="section-1" href="#section-1">1</a>. Introduction</span>
This report is a comparison of several data representation standards
that are currently in use. The standards, or system type
definitions, that will be discussed are the CCITT X.409
recommendation, the NBS Computer Based Message System (CBMS)
standard, DARPA Multimedia Mail system, the Courier remote procedure
call protocol, and the SUN Remote Procedure Call package.
One purpose of this report is to determine how the CCITT standard,
which is gaining wide acceptance internationally, compares with some
of the other standards that have been developed in the areas of
electronic mail, distributed interprocess communication, and remote
procedure call. The CCITT X.409 recommendation, which is entitled
"Presentation Transfer Syntax and Notation" is an international
standard which is a part of the X.400 series Message Handling Systems
(MHS) specifications [<a href="#ref-1" title=""Message Handling Systems: Presentation Transfer Syntax and Notation"">1</a>]. It has been adopted by both the NBS and
the ISO standards organizations. In addition, some commercial
organizations have announced intentions to support a CCITT interface
for electronic mail. The NBS Computer Based Message System (CBMS)
standard was developed previously and was published as a Federal
Information Processing Standard (FIPS Publication 98) in 1983 [<a href="#ref-3" title=""Specification for Message Format for Computer Based Message Systems"">3</a>].
The DARPA Multimedia Mail system is an experimental electronic mail
system which is in use in the DARPA Internet [<a href="#ref-2" title=""Research into Multimedia Message System Architecture"">2</a>,<a href="#ref-4" title=""Internet Multimedia Mail Transfer Protocol"">4</a>,<a href="#ref-5" title=""Internet Multimedia Mail Document Format"">5</a>]. It is used to
create and distribute messages that incorporate text, graphics,
stored speech, and images and has been implemented on on several very
different machines. Courier is the XEROX network systems remote
procedure call protocol [<a href="#ref-7" title=""Courier: The Remote Procedure Call Protocol"">7</a>]. The SUN Remote Procedure Call package
implements "network pipes" between UNIX machines [<a href="#ref-6" title=""Extended Data Representation Reference Manual"">6</a>].
<span class="grey">DeSchon [Page 1]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-2" ></span>
<span class="grey"><a href="./rfc971">RFC 971</a> January 1986</span>
A Survey of Data Representation Standards
<span class="h2"><a class="selflink" id="section-2" href="#section-2">2</a>. Background</span>
This section presents a brief overview of the basic terminology and
approach of each data representation standard.
2.1. Interprocess Communication Standards
The standards that are oriented towards distributed interprocess
communication or remote procedure call, between like machines,
generally favor the use of types that map easily into the types
defined in the programming language in use on the system. For
example, the types defined for the XEROX Courier system resemble
the types found in the Mesa programming language. Similarly, the
SUN Remote Procedure Call system types resemble the types found in
the C programming language. An advantage of a system implemented
using like machines is that the external data representation can
be defined in such a way that the conversion to and from the local
format is minimal.
2.1.1. Courier
The Courier standard data types are used to define the data
objects which are transported bi-directionally between system
elements that are running the Courier remote procedure call
protocol. The "standard representation" of a type is the
encoding of the data which is transmitted. The "standard
notation" refers to the conventions for the interpretation of
the data by higher-level applications. The standard
representation of a data object encodes the value of the
object, but the type of the object is determined by the
software that generates or interprets the representation.
2.1.2. SUN Remote Procedure Call Package
The SUN Remote Procedure Call package includes routines which
allow a process on one UNIX machine to consume data produced by
a process on another UNIX machine. This is called a "network
pipe" and is an extension of the standard UNIX pipe. The
"eXternal Data Representation (XDR)" standard defines the
routines that are used to encode or "serialize" data for
transmission, or to decode or "deserialize" data for local
interpretation. The syntax suggests that perhaps it should be
called "remote interprocess communication" rather than "remote
procedure call".
<span class="grey">DeSchon [Page 2]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-3" ></span>
<span class="grey"><a href="./rfc971">RFC 971</a> January 1986</span>
A Survey of Data Representation Standards
2.2. Message Standards
The message oriented standards, including DARPA Multimedia Mail,
NBS CBMS, and the CCITT X.409 standards, seem to favor more
general, highly extensible type definitions. This may have
something to do with the expectation that a system will include
many different machines, programmed using many different
programming languages.
2.2.1. DARPA Multimedia Mail
The DARPA Multimedia Mail system was developed for use in DoD
Internet community. The set of data elements used in the
Multimedia Message Handling Facility (MMHF) is referred to as
its "presentation transfer syntax". The encoding of these data
elements varies with the data type being represented. Each
begins with a one-octet "element-code". Some data elements are
of a pre-determined length. For example, the INTEGER data
element occupies five octets, one for the element-code, and
four which contain the "value component". Other data elements,
however, may vary in length. For example, the TEXT data
element, is made up of a one-octet element-code, a three-octet
count of the characters to follow, and a variable number of
octets, each containing one right-justified seven bit ASCII
character. The element-code and the length constitute the "tag
component".
A "base data element" is self contained, while a "structured
data element" is formed using other data elements. The LIST
data element is used to create structures composed of other
elements. The tag component of a LIST is made up of a
one-octet element-code, a three-octet count of the number of
octets to follow, and a two-octet count of the number of
elements that follow. The PROPLIST data element is used to
create a structure that consists of a set of unordered
name-value pairs. The tag component of a PROPLIST is made up
of a one-octet element-code, a three-octet count of the number
of octets to follow, and a one-octet count of the number of
name-value pairs in the PROPLIST. Both the LIST and the
PROPLIST elements are followed by an ENDLIST data element.
2.2.2. NBS Computer Based Message System
The NBS Computer Based Message System (CBMS) standard was
developed to specify the format of a message at the interface
between different computer-based message systems. Each data
element consists of a series of "components". The five
<span class="grey">DeSchon [Page 3]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-4" ></span>
<span class="grey"><a href="./rfc971">RFC 971</a> January 1986</span>
A Survey of Data Representation Standards
possible types of component are the "identifier octet", the
"length code", the "qualifier", the "property-list" component,
and the "data element contents". Every data element contains
an identifier octet and a length code. The identifier octet
contains a one-bit flag that signifies whether the data element
contains a property-list, and a code identifying the data
element and signifying whether it contains a qualifier. In the
NBS standard, the property-list is associated with a data
element and contains properties such as a "printing-name" or a
"comment". The meaning of the qualifier depends on the data
element code. The length code indicates the number of octets
following, and is between one and three octets in length.
Each data element is inherently a "primitive data element",
which contains a basic item of information, or a "constructor
data element", which contains one or more data elements. The
"field" data element (itself a constructor) uses a qualifier
component, which contains a "field identifier" to indicate
which specific field is being represented within a message.
2.2.3. CCITT Recommendation X.409
The CCITT recommendation X.409 defines the notation and the
representational technique used to specify and to encode the
Message Handling System (MHS) protocols. The following is a
description of the CCITT approach to encoding type definitions.
A data element consists of three components, the "identifier"
(type), the "length", and the "contents". An element and its
components consist of a sequence of an integral number of
octets. An identifier consists of a "class" ("universal",
"application-wide", "context-specific", or "private-use"), a
"form" ("primitive" or "constructor"), and the "id code".
There is a convention defined for both single-octet and
multi-octet identifiers. The length specifies the length of
the contents in octets, and is itself variable in length.
There is also an "indefinite" value defined for the length;
this means that no length for the contents is specified, and
the contents is terminated with the the "end-of-contents" (EOC)
element. In X.409 it is possible to determine whether a data
element is a primitive or a constructor from the form part of
the identifier. In addition it is possible to "tag" the data
by attaching meaning to an id code within the context of a
specific application.
<span class="grey">DeSchon [Page 4]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-5" ></span>
<span class="grey"><a href="./rfc971">RFC 971</a> January 1986</span>
A Survey of Data Representation Standards
<span class="h2"><a class="selflink" id="section-3" href="#section-3">3</a>. Implicit Versus Explicit Representation</span>
In both the SUN Remote Procedure Call system and the XEROX Courier
system the type definitions of external data are implicit. This
means that for a given type of call, or message, the type definitions
which is to be used to interpret the data, are agreed upon by the
sender and the receiver in advance. In other words, parameters (or
message fields) are assumed to be in a predefined order. Each
parameter is assumed to be of a predefined type. This means the data
cannot be reformated into the local form until it reaches a process
that knows about the types of specific parameters. At this point,
the conversion can be accomplished using system routines that know
how to convert from the external format to the local format. If the
system is homogeneous there may be very little conversion required.
In addition, no extra overhead of sending the type definitions with
the data is incurred.
In the DARPA Multimedia Mail system, the NBS CBMS standard, and the
CCITT X.409 recommendation, type definitions are explicit. In this
case the type definitions are encoded into the message. There are
several advantages to this approach. One advantage is that it allows
a low level receiver process in the destination host to convert the
data from the standard form to a form appropriate for the local host,
as it received. This can increase efficiency if it allows the
destination host to avoid passing around data that does not conform
to the local word boundaries. Another advantage is that it provides
flexibility for future expansion. Since the overall length is a part
of the type definition, it allows a host to deal with or ignore data
of types that it does not necessarily understand. Since the
interpretation of the data is not dependent on its position, message
fields (or parameters) can be reordered, or optionally omitted. The
disadvantages of this approach are as follows. Assuming that no
field could be omitted, the external representation of the message
may be longer than it would have been if an implicit representation
had been used. In addition, extra time may be consumed by the
conversion between external format and local format, since the
external format almost certainly will not match the local format for
any of the participants.
<span class="grey">DeSchon [Page 5]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-6" ></span>
<span class="grey"><a href="./rfc971">RFC 971</a> January 1986</span>
A Survey of Data Representation Standards
<span class="h2"><a class="selflink" id="section-4" href="#section-4">4</a>. Data Representation Standards Scorecard</span>
The following table is a comparison of the data elements defined for
the various standards being discussed. It is provided in order to
give a general idea of the types defined for each standard, but it
should be noted that the grouping of these types does not indicate
one type corresponds exactly to any other. Where it is applicable,
the identifier code appears in parantheses following the name of the
data element. Under "NUMBER", "S" stands for signed, "U" stands for
unsigned, "V" stands for variable, and the number represents the
number of bits. For example, "Integer S16" means a "signed 16-bit
integer".
Type CCITT MMM NBS XEROX Sun
-----------------------------------------------------------------------
END | End-of- | ENDLIST | End-of- | -- | --
| Contents | (11) | Constructor| |
| (0) | | (1) | |
| | | | |
PAD | Null (5) | NOP (0) | No-Op (0) | -- | --
| | PAD (1) | Padding | |
| | | (33) | |
| | | | |
RECORD | Set (17) | PROPLIST | Set (11) | -- | --
| | (14) | | |
| Sequence | LIST (9) | Sequence | Sequence | Structure
| (16) | | (10) | |
| | | | Record |
| | | Message | |
| | | (77) | |
| -- | -- | -- | Array | Fixed Array
| | | | | Counted Array
| "Choice" | -- | -- | Choice |Discriminated-
| "Any" | | | | Union
| | | | |
| "Tagged" | "name" | Field (76) | -- | --
| | |Unique-ID(9)| |
| -- | SHARE-TAG | -- | -- | --
| | (12) | | |
| | SHARE-REF | | |
| | (13) | | |
| | | | |
| -- | -- | Compressed | -- | --
| | | (70) | |
| -- | ENCRYPT | Encrypted | -- | --
| | (14) | (71) | |
<span class="grey">DeSchon [Page 6]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-7" ></span>
<span class="grey"><a href="./rfc971">RFC 971</a> January 1986</span>
A Survey of Data Representation Standards
Type CCITT MMM NBS XEROX Sun
-----------------------------------------------------------------------
BOOLEAN| Boolean(1)| BOOLEAN(2)| Boolean(8) | Boolean | Boolean
| | | | |
NUMBER | Integer(2)| EPI (5) | Integer(32)| Integer | Integer
| SV | SV | SV | S16 | S32
| | INDEX (3) | | Cardinal | Unsigned Int
| | U16 | | U16 | U32
| | INTEGER(4)| |Unspecified|Enumeration
| | S32 | | 16 | 32
| | | | Long Int |Hyper Integer
| | | | S32 | S64
| | | | Long Card |Uns Hyper Int
| | | | U32 | U64
| | | | | Double Prec
| | | | | 64
| -- | FLOAT (15)| -- | -- | Float Pt
| | 64 | | | 32
| | | | |
BIT- | Bit String| BITSTR(6) | Bit-String | -- | --
STRING| (3) | | (67) | |
| Octet- | -- | -- | -- | Opaque
| String(4)| | | |
| | | | |
STRING | IA5 (22) | TEXT (8) | ASCII- | String | Counted-
| | | String (2)| | Byte String
| | NAME (7) | | |
| Numeric | | | |
| (18) | | | |
| Printable | | | |
| (19) | | | |
| T.61 (20) | | | |
| Videotex | | | |
| (21) | | | |
<span class="grey">DeSchon [Page 7]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-8" ></span>
<span class="grey"><a href="./rfc971">RFC 971</a> January 1986</span>
A Survey of Data Representation Standards
Type CCITT MMM NBS XEROX Sun
-----------------------------------------------------------------------
OTHER | UTC Time | -- | Date (40) | -- | --
| (23) | | | |
| Gen Time | | | |
| (24) | | | |
| -- | -- | Property- | -- | --
| | | List (36)| |
| -- | -- |Property(69)| -- | --
| | | | |
| -- | -- | -- | Procedure | --
| | | | |
| -- | -- | Vendor- | -- | --
| | | Defined | |
| | | (127) | |
| | | Extension | |
| | | (126) | |
<span class="h2"><a class="selflink" id="section-5" href="#section-5">5</a>. Conclusions</span>
Of the standards discussed in this survey, the CCITT approach (X.409)
has already gained wide acceptance. For a system that will include a
number of dissimilar hosts, as might be the case for an Internet
application, a standard that employs explicit representation, such as
the CCITT X.409, would probably work well. Using the CCITT X.409
standard it is possible to construct most of the data elements that
are specified for the other standards, with the possible exception of
the "floating point" type. However, some of the flexibility that has
been built into this standard, such as the "private-use class" may
lead to ambiguity and a lack of coordination between implementors at
different sites. If a standard such as the CCITT were to be used in
an Internet experiment a fully defined (but large) subset would
probably have to be selected.
<span class="grey">DeSchon [Page 8]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-9" ></span>
<span class="grey"><a href="./rfc971">RFC 971</a> January 1986</span>
A Survey of Data Representation Standards
<span class="h2"><a class="selflink" id="section-6" href="#section-6">6</a>. References</span>
[<a id="ref-1">1</a>] "Message Handling Systems: Presentation Transfer Syntax and
Notation", Recommendation X.409, Document AP VIII-66-E,
International Telegraph and Telephone Consultative Committee
(CCITT), Malaga-Torremolinos, June, 1984.
[<a id="ref-2">2</a>] J. Garcia-Luna, A. Poggio, and D. Elliot, "Research into
Multimedia Message System Architecture", SRI International,
February, 1984.
[<a id="ref-3">3</a>] "Specification for Message Format for Computer Based Message
Systems", FIPS Pub 98 (also published as <a href="./rfc841">RFC 841</a>), National
Bureau of Standards, January, 1983.
[<a id="ref-4">4</a>] J. Postel, "Internet Multimedia Mail Transfer Protocol", USC
Information Sciences Institute, MMM-11 (<a href="./rfc759">RFC-759</a> revised), March,
1982.
[<a id="ref-5">5</a>] J. Postel, "Internet Multimedia Mail Document Format", USC
Information Sciences Institute, MMM-12 (<a href="./rfc767">RFC-767</a> revised), March,
1982.
[<a id="ref-6">6</a>] "Extended Data Representation Reference Manual", SUN
Microsystems, September, 1984.
[<a id="ref-7">7</a>] "Courier: The Remote Procedure Call Protocol", XSIS-038112,
XEROX Corporation, December, 1981.
DeSchon [Page 9]
</pre>
|