1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280
|
<pre> Network Working Group J. K. Reynolds (ISI)
Request for Comments: 978 R. Gillmann (Inner Loop)
W. A. Brackenridge (Alembic)
A. Witkowski (Inner Loop)
J. Postel (ISI)
February 1986
<span class="h1">VOICE FILE INTERCHANGE PROTOCOL (VFIP)</span>
STATUS OF THIS MEMO
This memo describes a proposed voice file interchange format for use
in the ARPA-Internet community. Suggestions for improvement are
encouraged. Distribution of this memo is unlimited.
<span class="h2"><a class="selflink" id="section-1" href="#section-1">1</a>. INTRODUCTION</span>
The purpose of the Voice File Interchange Protocol (VFIP) is to
permit the interchange of various types of speech files between
different systems. Currently, there are many different types of
voice implementations, but no specific standard has been set with an
eye towards compatability between these systems. With the increasing
interest and development of voice, specifically in Multimedia Mail,
there is an increased need to include standardized speech into a
common data structure.
The Voice File Interchange Protocol defines a header to describe the
voice data. The 18-byte header contains the identifier, the header
version number, the header length, a DTMF mask for Touch-Tones, the
recording rate in bits per second, the total time in deci-seconds
(tenths of a second), and the encoding/recording method (see
Figure 1).
<span class="h2"><a class="selflink" id="section-2" href="#section-2">2</a>. THE VOICE FILE INTERCHANGE PROTOCOL HEADER</span>
The Voice File Interchange Protocol header is organized as follows:
2.1 The Header Version Number
The version number is 1-byte. This first version is number one.
2.2 The Header Length
The length is a 1-byte field indicating the length of the entire
header in bytes. For this first version, the length is
18 (bytes).
<span class="grey">Reynolds, et al. [Page 1]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-2" ></span>
<span class="grey">Voice File Interchange Protocol <a href="./rfc978">RFC 978</a></span>
2.3 The DTMF Mask
This field describes what is known about DTMF Touch-Tones in the
data. The field consists of a 16 flag bits which indicate what is
known about particular DTMF tones. The 16 possible DTMF tones, in
order, are: 0 1 2 3 4 5 6 7 8 9 # * A B C D. The low order bit
of the field is tone 0.
A 1-bit signifies that the corresponding tone is guaranteed NOT to
be in the speech file. A 0-bit signifies that it may or may not
be in the speech file. Therefore, a field of 16 zeros denotes
that nothing is known about the tones. A field of 16 ones denotes
that there are no tones in the file.
2.4 Recording Rate
The recording rate is a 32-bit field and is the approximate rate
in bits/second of the method used to record the speech. For
variable rate methods, this may be very approximate.
2.5 Total Time
A 32-bit number indicating the total time of the recording in
deci-seconds. For example, 600 indicates 1 minute of speech.
2.6 Methods of Encoding/Recording
This 6-byte ASCII field indicates the method of
encoding/recording. Names shorter than six characters are padded
out to the right with blanks (the ASCII space character, code 32
decimal). For comparisons, the names are case insensitive.
Some known methods of Encoding/Recording are:
TI - The Texas Instruments card for the IBM PC [<a href="#ref-5" title=""The TI Speech Application Tool Kit Guide"">5</a>].
IBM - PC Voice Communications Options.
NVP-1 and NVP-2 - Network Voice Protocol [<a href="#ref-1" title=""Specifications for the Network Voice Protocol (NVP)"">1</a>,<a href="#ref-2" title=""A Network Voice Protocol (NVP-II)"">2</a>].
COMPUT - Computalker card for the IBM PC [<a href="#ref-4" title=""Compu Phone for the IBM PC/XT"">4</a>].
<span class="grey">Reynolds, et al. [Page 2]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-3" ></span>
<span class="grey">Voice File Interchange Protocol <a href="./rfc978">RFC 978</a></span>
<span class="h2"><a class="selflink" id="section-3" href="#section-3">3</a>. SUMMARY</span>
This 18-byte header will permit interchange of speech files between
different systems, as well as facilitate automatic conversion between
formats. The header does not have to be prepended to the speech file
proper; it may be in the form of a separate associated file, if that
is more convenient.
<------------16-bits------------>
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Version | Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| -DTMF- |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| -Recording- |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| -Rate- |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| -Total- |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| -Time- |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| M | E |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| T | H |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| O | D |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 1
<span class="grey">Reynolds, et al. [Page 3]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-4" ></span>
<span class="grey">Voice File Interchange Protocol <a href="./rfc978">RFC 978</a></span>
<span class="h2"><a class="selflink" id="section-4" href="#section-4">4</a>. EXAMPLES</span>
Example 1 is for one minute of 2400 bps NVP-2 speech. Nothing is
known about DTMF tones in the data.
<------------16-bits------------>
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 1 | 18 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 0 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 2400 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 600 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| N | V |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| P | - |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 2 | <sp> |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Example 1
<span class="grey">Reynolds, et al. [Page 4]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-5" ></span>
<span class="grey">Voice File Interchange Protocol <a href="./rfc978">RFC 978</a></span>
Example 2 shows the header for 10 seconds of 1200 bps TI speech, with
none of the DTMF tone 0-9 in the data, but no information about
tones *, #, A-D.
<------------16-bits------------>
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 1 | 18 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 1023 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 1200 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 100 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| T | I |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| <sp> | <sp> |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| <sp> | <sp> |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Example 2
REFERENCES
[<a id="ref-1">1</a>] Cohen, Danny, "Specifications for the Network Voice Protocol
(NVP)", <a href="./rfc741">RFC 741</a> (NIC 42444), USC/Information Sciences Institute,
January 1976.
[<a id="ref-2">2</a>] Cohen, Danny, "A Network Voice Protocol (NVP-II)",
USC/Information Sciences Institute, April 1981.
[<a id="ref-3">3</a>] O'Leary, G. C., "Local Access Area Facilities for Packet Voice",
MIT/LL, October 1980.
[<a id="ref-4">4</a>] Computalker, "Compu Phone for the IBM PC/XT", Santa Monica,
California, August 1985.
[<a id="ref-5">5</a>] Texas Instruments, Inc., "The TI Speech Application Tool Kit
Guide", TI Part #2232384-1, May 1985.
Reynolds, et al. [Page 5]
</pre>
|