1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285
|
<pre>Network Working Group A. Li
Request for Comments: 3558 UCLA
Category: Standards Track July 2003
<span class="h1">RTP Payload Format for Enhanced Variable Rate Codecs (EVRC)</span>
<span class="h1">and Selectable Mode Vocoders (SMV)</span>
Status of this Memo
This document specifies an Internet standards track protocol for the
Internet community, and requests discussion and suggestions for
improvements. Please refer to the current edition of the "Internet
Official Protocol Standards" (STD 1) for the standardization state
and status of this protocol. Distribution of this memo is unlimited.
Copyright Notice
Copyright (C) The Internet Society (2003). All Rights Reserved.
Abstract
This document describes the RTP payload format for Enhanced Variable
Rate Codec (EVRC) Speech and Selectable Mode Vocoder (SMV) Speech.
Two sub-formats are specified for different application scenarios. A
bundled/interleaved format is included to reduce the effect of packet
loss on speech quality and amortize the overhead of the RTP header
over more than one speech frame. A non-bundled format is also
supported for conversational applications.
Table of Contents
<a href="#section-1">1</a>. Introduction ................................................... <a href="#page-2">2</a>
<a href="#section-2">2</a>. Background ..................................................... <a href="#page-2">2</a>
<a href="#section-3">3</a>. The Codecs Supported ........................................... <a href="#page-3">3</a>
<a href="#section-3.1">3.1</a>. EVRC ...................................................... <a href="#page-3">3</a>
<a href="#section-3.2">3.2</a>. SMV ....................................................... <a href="#page-3">3</a>
<a href="#section-3.3">3.3</a>. Other Frame-Based Vocoders ................................ <a href="#page-4">4</a>
<a href="#section-4">4</a>. RTP/Vocoder Packet Format ...................................... <a href="#page-4">4</a>
<a href="#section-4.1">4.1</a>. Interleaved/Bundled Packet Format ......................... <a href="#page-5">5</a>
<a href="#section-4.2">4.2</a>. Header-Free Packet Format ................................. <a href="#page-6">6</a>
<a href="#section-4.3">4.3</a>. Determining the Format of Packets ......................... <a href="#page-7">7</a>
<a href="#section-5">5</a>. Packet Table of Contents Entries and Codec Data Frame Format ... <a href="#page-7">7</a>
<a href="#section-5.1">5.1</a>. Packet Table of Contents entries .......................... <a href="#page-7">7</a>
<a href="#section-5.2">5.2</a>. Codec Data Frames ......................................... <a href="#page-8">8</a>
<a href="#section-6">6</a>. Interleaving Codec Data Frames ................................. <a href="#page-9">9</a>
<a href="#section-7">7</a>. Bundling Codec Data Frames .................................... <a href="#page-12">12</a>
<a href="#section-8">8</a>. Handling Missing Codec Data Frames ............................ <a href="#page-12">12</a>
<span class="grey">Li Standards Track [Page 1]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-2" ></span>
<span class="grey"><a href="./rfc3558">RFC 3558</a> RTP Payload Format for EVRC and SMV July 2003</span>
<a href="#section-9">9</a>. Implementation Issues ......................................... <a href="#page-12">12</a>
<a href="#section-9.1">9.1</a>. Interleaving Length .......................................<a href="#page-12">12</a>
<a href="#section-9.2">9.2</a>. Validation of Received Packets ............................<a href="#page-13">13</a>
<a href="#section-9.3">9.3</a>. Processing the Late Packets ...............................<a href="#page-13">13</a>
<a href="#section-10">10</a>. Mode Request ................................................. <a href="#page-13">13</a>
<a href="#section-11">11</a>. Storage Format ............................................... <a href="#page-14">14</a>
<a href="#section-12">12</a>. IANA Considerations .......................................... <a href="#page-15">15</a>
<a href="#section-12.1">12.1</a>. Registration of Media Type EVRC ..........................<a href="#page-15">15</a>
<a href="#section-12.2">12.2</a>. Registration of Media Type EVRC0 .........................<a href="#page-16">16</a>
<a href="#section-12.3">12.3</a>. Registration of Media Type SMV ...........................<a href="#page-17">17</a>
<a href="#section-12.4">12.4</a>. Registration of Media Type SMV0 ..........................<a href="#page-18">18</a>
<a href="#section-13">13</a>. Mapping to SDP Parameters .................................... <a href="#page-19">19</a>
<a href="#section-14">14</a>. Security Considerations ...................................... <a href="#page-20">20</a>
<a href="#section-15">15</a>. Adding Support of Other Frame-Based Vocoders ................. <a href="#page-20">20</a>
<a href="#section-16">16</a>. Acknowledgements ............................................. <a href="#page-21">21</a>
<a href="#section-17">17</a>. References ................................................... <a href="#page-21">21</a>
<a href="#section-17.1">17.1</a> Normative ................................................ <a href="#page-21">21</a>
<a href="#section-17.2">17.2</a> Informative .............................................. <a href="#page-22">22</a>
<a href="#section-18">18</a>. Author's Address ............................................. <a href="#page-22">22</a>
<a href="#section-19">19</a>. Full Copyright Statement ..................................... <a href="#page-23">23</a>
<span class="h2"><a class="selflink" id="section-1" href="#section-1">1</a>. Introduction</span>
This document describes how speech compressed with EVRC [<a href="#ref-1" title=""Enhanced Variable Rate Codec, Speech Service Option 3 for Wideband Spread Spectrum Digital Systems"">1</a>] or SMV
[<a href="#ref-2" title=""Selectable Mode Vocoder, Service Option for Wideband Spread Spectrum Communication Systems"">2</a>] may be formatted for use as an RTP payload type. The format is
also extensible to other codecs that generate a similar set of frame
types. Two methods are provided to packetize the codec data frames
into RTP packets: an interleaved/bundled format and a zero-header
format. The sender may choose the best format for each application
scenario, based on network conditions, bandwidth availability, delay
requirements, and packet-loss tolerance.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in <a href="./rfc2119">RFC 2119</a> [<a href="#ref-3" title=""Key words for use in RFCs to Indicate Requirement Levels"">3</a>].
<span class="h2"><a class="selflink" id="section-2" href="#section-2">2</a>. Background</span>
The 3rd Generation Partnership Project 2 (3GPP2) has published two
standards which define speech compression algorithms for CDMA
applications: EVRC [<a href="#ref-1" title=""Enhanced Variable Rate Codec, Speech Service Option 3 for Wideband Spread Spectrum Digital Systems"">1</a>] and SMV [<a href="#ref-2" title=""Selectable Mode Vocoder, Service Option for Wideband Spread Spectrum Communication Systems"">2</a>]. EVRC is currently deployed in
millions of first and second generation CDMA handsets. SMV is the
preferred speech codec standard for CDMA2000, and will be deployed in
third generation handsets in addition to EVRC. Improvements and new
codecs will keep emerging as technology improves, and future handsets
will likely support multiple codecs.
<span class="grey">Li Standards Track [Page 2]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-3" ></span>
<span class="grey"><a href="./rfc3558">RFC 3558</a> RTP Payload Format for EVRC and SMV July 2003</span>
The formats of the EVRC and SMV codec frames are very similar. Many
other vocoders also share common characteristics, and have many
similar application scenarios. This parallelism enables an RTP
payload format to be designed for EVRC and SMV that may also support
other, similar vocoders with minimal additional specification work.
This can simplify the protocol for transporting vocoder data frames
through RTP and reduce the complexity of implementations.
<span class="h2"><a class="selflink" id="section-3" href="#section-3">3</a>. The Codecs Supported</span>
<span class="h3"><a class="selflink" id="section-3.1" href="#section-3.1">3.1</a>. EVRC</span>
The Enhanced Variable Rate Codec (EVRC) [<a href="#ref-1" title=""Enhanced Variable Rate Codec, Speech Service Option 3 for Wideband Spread Spectrum Digital Systems"">1</a>] compresses each 20
milliseconds of 8000 Hz, 16-bit sampled speech input into output
frames in one of the three different sizes: Rate 1 (171 bits), Rate
1/2 (80 bits), or Rate 1/8 (16 bits). In addition, there are two
zero bit codec frame types: null frames and erasure frames. Null
frames are produced as a result of the vocoder running at rate 0.
Null frames are zero bits long and are normally not transmitted.
Erasure frames are the frames substituted by the receiver to the
codec for the lost or damaged frames. Erasure frames are also zero
bits long and are normally not transmitted.
The codec chooses the output frame rate based on analysis of the
input speech and the current operating mode (either normal or one of
several reduced rate modes). For typical speech patterns, this
results in an average output of 4.2 kilobits/second for normal mode
and a lower average output for reduced rate modes.
<span class="h3"><a class="selflink" id="section-3.2" href="#section-3.2">3.2</a>. SMV</span>
The Selectable Mode Vocoder (SMV) [<a href="#ref-2" title=""Selectable Mode Vocoder, Service Option for Wideband Spread Spectrum Communication Systems"">2</a>] compresses each 20 milliseconds
of 8000 Hz, 16-bit sampled speech input into output frames of one of
the four different sizes: Rate 1 (171 bits), Rate 1/2 (80 bits), Rate
1/4 (40 bits), or Rate 1/8 (16 bits). In addition, there are two
zero bit codec frame types: null frames and erasure frames. Null
frames are produced as a result of the vocoder running at rate 0.
Null frames are zero bits long and are normally not transmitted.
Erasure frames are the frames substituted by the receiver to the
codec for the lost or damaged frames. Erasure frames are also zero
bits long and are normally not transmitted.
The SMV codec can operate in six modes. Each mode may produce frames
of any of the rates (full rate to 1/8 rate) for varying percentages
of time, based on the characteristics of the speech samples and the
selected mode. The SMV mode can change on a
frame-by-frame basis. The SMV codec does not need additional
information other than the codec data frames to correctly decode the
<span class="grey">Li Standards Track [Page 3]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-4" ></span>
<span class="grey"><a href="./rfc3558">RFC 3558</a> RTP Payload Format for EVRC and SMV July 2003</span>
data of various modes; therefore, the mode of the encoder does not
need to be transmitted with the encoded frames.
The SMV codec chooses the output frame rate based on analysis of the
input speech and the current operating mode. For typical speech
patterns, this results in an average output of 4.2 kilobits/second
for Mode 0 in two way conversation (approximately 50% active speech
time and 50% in eighth rate while listening) and lower for other
reduced rate modes. SMV is more bandwidth efficient than EVRC. EVRC
is equivalent in performance to SMV mode 1.
<span class="h3"><a class="selflink" id="section-3.3" href="#section-3.3">3.3</a>. Other Frame-Based Vocoders</span>
Other frame-based vocoders can be carried in the packet format
defined in this document, as long as they possess the following
properties:
o The codec is frame-based;
o blank and erasure frames are supported;
o the total number of rates is less than 17;
o the maximum full rate frame can be transported in a single RTP
packet using this specific format.
Vocoders with the characteristics listed above can be transported
using the packet format specified in this document with some
additional specification work; the pieces that must be defined are
listed in <a href="#section-15">Section 15</a>.
<span class="h2"><a class="selflink" id="section-4" href="#section-4">4</a>. RTP/Vocoder Packet Format</span>
The vocoder speech data may be transmitted in either of the two RTP
packet formats specified in the following two subsections, as
appropriate for the application scenario. In the packet format
diagrams shown in this document, bit 0 is the most significant bit.
<span class="grey">Li Standards Track [Page 4]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-5" ></span>
<span class="grey"><a href="./rfc3558">RFC 3558</a> RTP Payload Format for EVRC and SMV July 2003</span>
<span class="h3"><a class="selflink" id="section-4.1" href="#section-4.1">4.1</a>. Interleaved/Bundled Packet Format</span>
This format is used to send one or more vocoder frames per packet.
Interleaving or bundling MAY be used. The RTP packet for this format
is as follows:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| RTP Header [<a href="#ref-4" title=""RTP: A Transport Protocol for Real-Time Applications"">4</a>] |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
|R|R| LLL | NNN | MMM | Count | TOC | ... | TOC |padding|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| one or more codec data frames, one per TOC entry |
| .... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The RTP header has the expected values as described in the RTP
specification [<a href="#ref-4" title=""RTP: A Transport Protocol for Real-Time Applications"">4</a>]. The RTP timestamp is in 1/8000 of a second units
for EVRC and SMV. For any other vocoders that use this packet
format, the timestamp unit needs to be defined explicitly. The M bit
should be set as specified in the applicable RTP profile, for
example, <a href="./rfc3551">RFC 3551</a> [<a href="#ref-5" title=""RTP Profile for Audio and Video Conferences with Minimal Control"">5</a>]. Note that <a href="./rfc3551">RFC 3551</a> [<a href="#ref-5" title=""RTP Profile for Audio and Video Conferences with Minimal Control"">5</a>] specifies that if the
sender does not suppress silence, the M bit will always be zero.
When multiple codec data frames are present in a single RTP packet,
the timestamp is that of the oldest data represented in the RTP
packet. The assignment of an RTP payload type for this packet format
is outside the scope of this document; it is specified by the RTP
profile under which this payload format is used.
The first octet of a Interleaved/Bundled format packet is the
Interleave Octet. The second octet contains the Mode Request and
Frame Count fields. The Table of Contents (ToC) field then follows.
The fields are specified as follows:
Reserved (RR): 2 bits
Reserved bits. MUST be set to zero by sender, SHOULD be ignored
by receiver.
Interleave Length (LLL): 3 bits
Indicates the length of interleave; a value of 0 indicates
bundling, a special case of interleaving. See <a href="#section-6">Section 6</a> and
<a href="#section-7">Section 7</a> for more detailed discussion.
<span class="grey">Li Standards Track [Page 5]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-6" ></span>
<span class="grey"><a href="./rfc3558">RFC 3558</a> RTP Payload Format for EVRC and SMV July 2003</span>
Interleave Index (NNN): 3 bits
Indicates the index within an interleave group. MUST have a value
less than or equal to the value of LLL. Values of NNN greater
than the value of LLL are invalid. Packet with invalid NNN values
SHOULD be ignored by the receiver.
Mode Request (MMM): 3 bits
The Mode Request field is used to signal Mode Request information.
See <a href="#section-10">Section 10</a> for details.
Frame Count (Count): 5 bits
The number of ToC fields (and vocoder frames) present in the
packet is the value of the frame count field plus one. A value of
zero indicates that the packet contains one ToC field, while a
value of 31 indicates that the packet contains 32 ToC fields.
Padding (padding): 0 or 4 bits
This padding ensures that codec data frames start on an octet
boundary. When the frame count is odd, the sender MUST add 4 bits
of padding following the last TOC. When the frame count is even,
the sender MUST NOT add padding bits. If padding is present, the
padding bits MUST be set to zero by sender, and SHOULD be ignored
by receiver.
The Table of Contents field (ToC) provides information on the codec
data frame(s) in the packet. There is one ToC entry for each codec
data frame. The detailed formats of the ToC field and codec data
frames are specified in <a href="#section-5">Section 5</a>.
Multiple data frames may be included within a Interleaved/Bundled
packet using interleaving or bundling as described in <a href="#section-6">Section 6</a> and
<a href="#section-7">Section 7</a>.
<span class="h3"><a class="selflink" id="section-4.2" href="#section-4.2">4.2</a>. Header-Free Packet Format</span>
The Header-Free Packet Format is designed for maximum bandwidth
efficiency and low latency. Only one codec data frame can be sent in
each Header-Free format packet. None of the payload header fields
(LLL, NNN, MMM, Count) nor ToC entries are present. The codec rate
for the data frame can be determined from the length of the codec
data frame, since there is only one codec data frame in each
Header-Free packet.
Use of the RTP header fields for Header-Free RTP/Vocoder Packet
Format is the same as described in <a href="#section-4.1">Section 4.1</a> for
Interleaved/Bundled RTP/Vocoder Packet Format. The detailed format
of the codec data frame is specified in <a href="#section-5">Section 5</a>.
<span class="grey">Li Standards Track [Page 6]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-7" ></span>
<span class="grey"><a href="./rfc3558">RFC 3558</a> RTP Payload Format for EVRC and SMV July 2003</span>
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| RTP Header [<a href="#ref-4" title=""RTP: A Transport Protocol for Real-Time Applications"">4</a>] |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
| |
+ ONLY one codec data frame +-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
<span class="h3"><a class="selflink" id="section-4.3" href="#section-4.3">4.3</a>. Determining the Format of Packets</span>
All receivers SHOULD be able to process both packet formats. The
sender MAY choose to use one or both packet formats.
A receiver MUST have prior knowledge of the packet format to
correctly decode the RTP packets. When packets of both formats are
used within the same session, different RTP payload type values MUST
be used for each format to distinguish the packet formats. The
association of payload type number with the packet format is done
out-of-band, for example by SDP during the setup of a session.
<span class="h2"><a class="selflink" id="section-5" href="#section-5">5</a>. Packet Table of Contents Entries and Codec Data Frame Format</span>
<span class="h3"><a class="selflink" id="section-5.1" href="#section-5.1">5.1</a>. Packet Table of Contents entries</span>
Each codec data frame in a Interleaved/Bundled packet has a
corresponding Table of Contents (ToC) entry. The ToC entry indicates
the rate of the codec frame. (Header-Free packets MUST NOT have a
ToC field.)
Each ToC entry is occupies four bits. The format of the bits is
indicated below:
0 1 2 3
+-+-+-+-+
|fr type|
+-+-+-+-+
Frame Type: 4 bits
The frame type indicates the type of the corresponding codec data
frame in the RTP packet.
<span class="grey">Li Standards Track [Page 7]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-8" ></span>
<span class="grey"><a href="./rfc3558">RFC 3558</a> RTP Payload Format for EVRC and SMV July 2003</span>
For EVRC and SMV codecs, the frame type values and size of the
associated codec data frame are described in the table below:
Value Rate Total codec data frame size (in octets)
---------------------------------------------------------
0 Blank 0 (0 bit)
1 1/8 2 (16 bits)
2 1/4 5 (40 bits; not valid for EVRC)
3 1/2 10 (80 bits)
4 1 22 (171 bits; 5 padded at end with zeros)
5 Erasure 0 (SHOULD NOT be transmitted by sender)
All values not listed in the above table MUST be considered reserved.
A ToC entry with a reserved Frame Type value SHOULD be considered
invalid. Note that the EVRC codec does not have 1/4 rate frames,
thus frame type value 2 MUST be considered a reserved value when the
EVRC codec is in use.
Other vocoders that use this packet format need to specify their own
table of frame types and corresponding codec data frames.
<span class="h3"><a class="selflink" id="section-5.2" href="#section-5.2">5.2</a>. Codec Data Frames</span>
The output of the vocoder MUST be converted into codec data frames
for inclusion in the RTP payload. The conversions for EVRC and SMV
codecs are specified below. (Note: Because the EVRC codec does not
have Rate 1/4 frames, the specifications of 1/4 frames does not apply
to EVRC codec data frames). Other vocoders that use this packet
format need to specify how to convert vocoder output data into
frames.
The codec output data bits as numbered in EVRC and SMV are packed
into octets. The lowest numbered bit (bit 1 for Rate 1, Rate 1/2,
Rate 1/4 and Rate 1/8) is placed in the most significant bit
(internet bit 0) of octet 1 of the codec data frame, the second
lowest bit is placed in the second most significant bit of the first
octet, the third lowest in the third most significant bit of the
first octet, and so on. This continues until all of the bits have
been placed in the codec data frame.
The remaining unused bits of the last octet of the codec data frame
MUST be set to zero. Note that in EVRC and SMV this is only
applicable to Rate 1 frames (171 bits) as the Rate 1/2 (80 bits),
Rate 1/4 (40 bits, SMV only) and Rate 1/8 frames (16 bits) fit
exactly into a whole number of octets.
Following is a detailed listing showing a Rate 1 EVRC/SMV codec
output frame converted into a codec data frame:
<span class="grey">Li Standards Track [Page 8]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-9" ></span>
<span class="grey"><a href="./rfc3558">RFC 3558</a> RTP Payload Format for EVRC and SMV July 2003</span>
The codec data frame for a EVRC/SMV codec Rate 1 frame is 22 octets
long. Bits 1 through 171 from the EVRC/SMV codec Rate 1 frame are
placed as indicated, with bits marked with "Z" set to zero. EVRC/SMV
codec Rate 1/8, Rate 1/4 and Rate 1/2 frames are converted similarly,
but do not require zero padding because they align on octet
boundaries.
Rate 1 codec data frame
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
|0|0|0|0|0|0|0|0|0|1|1|1|1|1|1|1|1|1|1|2|2|2|2|2|2|2|2|2|2|3|3|3|
|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
: :
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1| | | | | |
|4|4|4|4|4|5|5|5|5|5|5|5|5|5|5|6|6|6|6|6|6|6|6|6|6|7|7|Z|Z|Z|Z|Z|
|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1| | | | | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
<span class="h2"><a class="selflink" id="section-6" href="#section-6">6</a>. Interleaving Codec Data Frames</span>
As indicated in <a href="#section-4.1">Section 4.1</a>, more than one codec data frame MAY be
included in a single Interleaved/Bundled packet by a sender. This is
accomplished by interleaving or bundling.
Bundling is used to spread the transmission overhead of the RTP and
payload header over multiple vocoder frames. Interleaving
additionally reduces the listener's perception of data loss by
spreading such loss over non-consecutive vocoder frames. EVRC, SMV,
and similar vocoders are able to compensate for an occasional lost
frame, but speech quality degrades exponentially with consecutive
frame loss.
Bundling is signaled by setting the LLL field to zero and the Count
field to greater than zero. Interleaving is indicated by setting the
LLL field to a value greater than zero.
The discussions on general interleaving apply to the bundling (which
can be viewed as a reduced case of interleaving) with reduced
complexity. The bundling case is discussed in detail in <a href="#section-7">Section 7</a>.
Senders MAY support interleaving and/or bundling. All receivers that
support Interleave/Bundling packet format MUST support both
interleaving and bundling.
<span class="grey">Li Standards Track [Page 9]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-10" ></span>
<span class="grey"><a href="./rfc3558">RFC 3558</a> RTP Payload Format for EVRC and SMV July 2003</span>
Given a time-ordered sequence of output frames from the codec
numbered 0..n, a bundling value B (the value in the Count field plus
one), and an interleave length L where n = B * (L+1) - 1, the output
frames are placed into RTP packets as follows (the values of the
fields LLL and NNN are indicated for each RTP packet):
First RTP Packet in Interleave group:
LLL=L, NNN=0
Frame 0, Frame L+1, Frame 2(L+1), Frame 3(L+1), ... for a total of
B frames
Second RTP Packet in Interleave group:
LLL=L, NNN=1
Frame 1, Frame 1+L+1, Frame 1+2(L+1), Frame 1+3(L+1), ... for a
total of B frames
This continues to the last RTP packet in the interleave group:
L+1 RTP Packet in Interleave group:
LLL=L, NNN=L
Frame L, Frame L+L+1, Frame L+2(L+1), Frame L+3(L+1), ... for a
total of B frames
Within each interleave group, the RTP packets making up the
interleave group MUST be transmitted in value-increasing order of the
NNN field. While this does not guarantee reduced end-to-end delay on
the receiving end, when packets are delivered in order by the
underlying transport, delay will be reduced to the minimum possible.
Receivers MAY signal the maximum number of codec data frames (i.e.,
the maximum acceptable bundling value B) they can handle in a single
RTP packet using the OPTIONAL maxptime RTP mode parameter identified
in <a href="#section-12">Section 12</a>.
Receivers MAY signal the maximum interleave length (i.e., the maximum
acceptable LLL value in the Interleaving Octet) they will accept
using the OPTIONAL maxinterleave RTP mode parameter identified in
<a href="#section-12">Section 12</a>.
The parameters maxptime and maxinterleave are exchanged at the
initial setup of the session. In one-to-one sessions, the sender
MUST respect these values set be the receiver, and MUST NOT
interleave/bundle more packets than what the receiver signals that it
can handle. This ensures that the receiver can allocate a known
amount of buffer space that will be sufficient for all
interleaving/bundling used in that session. During the session, the
sender may decrease the bundling value or interleaving length (so
that less buffer space is required at the receiver), but never exceed
<span class="grey">Li Standards Track [Page 10]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-11" ></span>
<span class="grey"><a href="./rfc3558">RFC 3558</a> RTP Payload Format for EVRC and SMV July 2003</span>
the maximum value set by the receiver. This prevents the situation
where a receiver needs to allocate more buffer space in the middle of
a session but is unable to do so.
Additionally, senders have the following restrictions:
o MUST NOT bundle more codec data frames in a single RTP packet than
indicated by maxptime (see <a href="#section-12">Section 12</a>) if it is signaled.
o SHOULD NOT bundle more codec data frames in a single RTP packet
than will fit in the MTU of the underlying network.
o Once beginning a session with a given maximum interleaving value
set by maxinterleave in <a href="#section-12">Section 12</a>, MUST NOT increase the
interleaving value (LLL) to exceed the maximum interleaving value
that is signaled.
o MAY change the interleaving value, but MUST do so only between
interleave groups.
o Silence suppression MUST only be used between interleave groups.
A ToC with Frame Type 0 (Blank Frame, <a href="#section-5.1">Section 5.1</a>) MUST be used
within interleaving groups if the codec outputs a blank frame.
The M bit in the RTP header is not set for these blank frames, as
the stream is continuous in time. Because there is only one time
stamp for each RTP packet, silence suppression used within an
interleave group would cause ambiguities when reconstructing the
speech at the receiver side, and thus is prohibited.
Given an RTP packet with sequence number S, interleave length (field
LLL) L, interleave index value (field NNN) N, and bundling value B,
the interleave group consists of this RTP packet and other RTP
packets with sequence numbers from S-N mod 65536 to S-N+L mod 65536
inclusive. In other words, the interleave group always consists of
L+1 RTP packets with sequential sequence numbers. The bundling value
for all RTP packets in an interleave group MUST be the same.
The receiver determines the expected bundling value for all RTP
packets in an interleave group by the number of codec data frames
bundled in the first RTP packet of the interleave group received.
Note that this may not be the first RTP packet of the interleave
group if packets are delivered out of order by the underlying
transport.
<span class="grey">Li Standards Track [Page 11]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-12" ></span>
<span class="grey"><a href="./rfc3558">RFC 3558</a> RTP Payload Format for EVRC and SMV July 2003</span>
<span class="h2"><a class="selflink" id="section-7" href="#section-7">7</a>. Bundling Codec Data Frames</span>
As discussed in <a href="#section-6">Section 6</a>, the bundling of codec data frames is a
special reduced case of interleaving with LLL value in the Interleave
Octet set to 0.
Bundling codec data frames indicates that multiple data frames are
included consecutively in a packet, because the interleaving length
(LLL) is 0. The interleaving group is thus reduced to a single RTP
packet, and the reconstruction of the codec data frames from RTP
packets becomes a much simpler process.
Furthermore, the additional restrictions on senders are reduced to:
o MUST NOT bundle more codec data frames in a single RTP packet than
indicated by maxptime (see <a href="#section-12">Section 12</a>) if it is signaled.
o SHOULD NOT bundle more codec data frames in a single RTP packet
than will fit in the MTU of the underlying network.
<span class="h2"><a class="selflink" id="section-8" href="#section-8">8</a>. Handling Missing Codec Data Frames</span>
The vocoders covered by this payload format support erasure frames as
an indication when frames are not available. The erasure frames are
normally used internally by a receiver to advance the state of the
voice decoder by exactly one frame time for each missing frame.
Using the information from packet sequence number, time stamp, and
the M bit, the receiver can detect missing codec data frames from RTP
packet loss and/or silence suppression, and generate corresponding
erasure frames. Erasure frames MUST also be used in storage format
to record missing frames.
<span class="h2"><a class="selflink" id="section-9" href="#section-9">9</a>. Implementation Issues</span>
<span class="h3"><a class="selflink" id="section-9.1" href="#section-9.1">9.1</a>. Interleaving Length</span>
The vocoder interpolates the missing speech content when given an
erasure frame. However, the best quality is perceived by the
listener when erasure frames are not consecutive. This makes
interleaving desirable as it increases speech quality when packet
loss occurs.
On the other hand, interleaving can greatly increase the end-to-end
delay. Where an interactive session is desired, either
Interleaved/Bundled packet format with interleaving length (field
LLL) 0 or Header-Free packet format is RECOMMENDED.
<span class="grey">Li Standards Track [Page 12]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-13" ></span>
<span class="grey"><a href="./rfc3558">RFC 3558</a> RTP Payload Format for EVRC and SMV July 2003</span>
When end-to-end delay is not a primary concern, an interleaving
length (field LLL) of 4 or 5 is RECOMMENDED as it offers a reasonable
compromise between robustness and latency.
<span class="h3"><a class="selflink" id="section-9.2" href="#section-9.2">9.2</a>. Validation of Received Packets</span>
When receiving an RTP packet, the receiver SHOULD check the validity
of the ToC fields and match the length of the packet with what is
indicated by the ToC fields. If any invalidity or mismatch is
detected, it is RECOMMENDED to discard the received packet to avoid
potential severe degradation of the speech quality. The discarded
packet is treated following the same procedure as a lost packet, and
the discarded data will be replaced with erasure frames.
On receipt of an RTP packet with an invalid value of the LLL or NNN
fields, the RTP packet SHOULD be treated as lost by the receiver for
the purpose of generating erasure frames as described in <a href="#section-8">Section 8</a>.
On receipt of an RTP packet in an interleave group with other than
the expected frame count value, the receiver MAY discard codec data
frames off the end of the RTP packet or add erasure codec data frames
to the end of the packet in order to manufacture a substitute packet
with the expected bundling value. The receiver MAY instead choose to
discard the whole interleave group.
<span class="h3"><a class="selflink" id="section-9.3" href="#section-9.3">9.3</a>. Processing the Late Packets</span>
Assume that the receiver has begun playing frames from an interleave
group. The time has come to play frame x from packet n of the
interleave group. Further assume that packet n of the interleave
group has not been received. As described in <a href="#section-8">Section 8</a>, an erasure
frame will be sent to the receiving vocoder.
Now, assume that packet n of the interleave group arrives before
frame x+1 of that packet is needed. Receivers should use frame x+1
of the newly received packet n rather than substituting an erasure
frame. In other words, just because packet n was not available the
first time it was needed to reconstruct the interleaved speech, the
receiver should not assume it is not available when it is
subsequently needed for interleaved speech reconstruction.
<span class="h2"><a class="selflink" id="section-10" href="#section-10">10</a>. Mode Request</span>
The Mode Request signal requests a particular encoding mode for the
speech encoding in the reverse direction. All implementations are
RECOMMENDED to honor the Mode Request signal. The Mode Request
signal SHOULD only be used in one-to-one sessions. In multi-party
sessions, any received Mode Request signals SHOULD be ignored.
<span class="grey">Li Standards Track [Page 13]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-14" ></span>
<span class="grey"><a href="./rfc3558">RFC 3558</a> RTP Payload Format for EVRC and SMV July 2003</span>
In addition, the Mode Request signal MAY also be sent through non-RTP
means, which is out of the scope of this specification.
The three-bit Mode Request field is used to signal the receiver to
set a particular encoding mode to its audio encoder. If the Mode
Request field is set to a valid value in RTP packets from node A to
node B, it is a request for node B to change to the requested
encoding mode for its audio encoder and therefore the bit rate of the
RTP stream from node B to node A. Once a node sets this field to a
value, it SHOULD continue to set the field to the same value in
subsequent packets until the requested mode is different. This
design helps to eliminate the scenario of getting the codec stuck in
an unintended state if one of the packets that carries the Mode
Request is lost. An otherwise silent node MAY send an RTP packet
containing a blank frame in order to send a Mode Request.
Each codec type using this format SHOULD define its own
interpretation of the Mode Request field. Codecs SHOULD follow the
convention that higher values of the three-bit field correspond to an
equal or lower average output bit rate.
For the EVRC codec, the Mode Request field MUST be interpreted
according to Tables 2.2.1.2-1 and 2.2.1.2-2 of the EVRC codec
specifications [<a href="#ref-1" title=""Enhanced Variable Rate Codec, Speech Service Option 3 for Wideband Spread Spectrum Digital Systems"">1</a>].
For SMV codec, the Mode Request field MUST be interpreted according
to Table 2.2-2 of the SMV codec specifications [<a href="#ref-2" title=""Selectable Mode Vocoder, Service Option for Wideband Spread Spectrum Communication Systems"">2</a>].
<span class="h2"><a class="selflink" id="section-11" href="#section-11">11</a>. Storage Format</span>
The storage format is used for storing speech frames, e.g., as a file
or e-mail attachment.
The file begins with a magic number to identify the vocoder that is
used. The magic number for EVRC corresponds to the ASCII character
string "#!EVRC\n", i.e., "0x23 0x21 0x45 0x56 0x52 0x43 0x0A". The
magic number for SMV corresponds to the ASCII character string
"#!SMV\n", i.e., "0x23 0x21 0x53 0x4d 0x56 0x0a".
The codec data frames are stored in consecutive order, with a single
TOC entry field, extended to one octet, prefixing each codec data
frame. The ToC field is extended to one octet by setting the four
most significant bits of the octet to zero. For example, a ToC value
of 4 (a full-rate frame) is stored as 0x04.
Speech frames lost in transmission and non-received frames MUST be
stored as erasure frames (frame type 5, see definition in <a href="#section-5.1">Section</a>
<a href="#section-5.1">5.1</a>) to maintain synchronization with the original media.
<span class="grey">Li Standards Track [Page 14]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-15" ></span>
<span class="grey"><a href="./rfc3558">RFC 3558</a> RTP Payload Format for EVRC and SMV July 2003</span>
<span class="h2"><a class="selflink" id="section-12" href="#section-12">12</a>. IANA Considerations</span>
Four new MIME sub-types as described in this section have been
registered by the IANA.
The MIME-names for the EVRC and SMV codec are allocated from the IETF
tree since all the vocoders covered are expected to be widely used
for Voice-over-IP applications.
<span class="h3"><a class="selflink" id="section-12.1" href="#section-12.1">12.1</a>. Registration of Media Type EVRC</span>
Media Type Name: audio
Media Subtype Name: EVRC
Required Parameter: none
Optional parameters:
The following parameters apply to RTP transfer only.
ptime: Defined as usual for RTP audio (see <a href="./rfc2327">RFC 2327</a>).
maxptime: The maximum amount of media which can be encapsulated in
each packet, expressed as time in milliseconds. The time SHALL
be calculated as the sum of the time the media present in the
packet represents. The time SHOULD be a multiple of the
duration of a single codec data frame (20 msec). If not
signaled, the default maxptime value SHALL be 200 milliseconds.
maxinterleave: Maximum number for interleaving length (field LLL
in the Interleaving Octet). The interleaving lengths used in
the entire session MUST NOT exceed this maximum value. If not
signaled, the maxinterleave length SHALL be 5.
Encoding considerations:
This type is defined for transfer of EVRC-encoded data via RTP
using the Interleaved/Bundled packet format specified in Sections
4.1, 6, and 7 of <a href="./rfc3558">RFC 3558</a>. It is also defined for other transfer
methods using the storage format specified in Section 11 of <a href="./rfc3558">RFC</a>
<a href="./rfc3558">3558</a>.
Security considerations:
See <a href="#section-14">Section 14</a> "Security Considerations" of <a href="./rfc3558">RFC 3558</a>.
Public specification:
The EVRC vocoder is specified in 3GPP2 C.S0014. Transfer methods
are specified in <a href="./rfc3558">RFC 3558</a>.
<span class="grey">Li Standards Track [Page 15]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-16" ></span>
<span class="grey"><a href="./rfc3558">RFC 3558</a> RTP Payload Format for EVRC and SMV July 2003</span>
Additional information:
The following information applies for storage format only.
Magic number: #!EVRC\n (see <a href="./rfc3558#section-11">Section 11 of RFC 3558</a>)
File extensions: evc, EVC
Macintosh file type code: none
Object identifier or OID: none
Intended usage:
COMMON. It is expected that many VoIP applications (as well as
mobile applications) will use this type.
Person & email address to contact for further information:
Adam Li
adamli@icsl.ucla.edu
Author/Change controller:
Adam Li
adamli@icsl.ucla.edu
IETF Audio/Video Transport Working Group
<span class="h3"><a class="selflink" id="section-12.2" href="#section-12.2">12.2</a>. Registration of Media Type EVRC0</span>
Media Type Name: audio
Media Subtype Name: EVRC0
Required Parameters: none
Optional parameters: none
Encoding considerations: none
This type is only defined for transfer of EVRC-encoded data via
RTP using the Header-Free packet format specified in <a href="./rfc3558#section-4.2">Section 4.2
of RFC 3558</a>.
Security considerations:
See <a href="#section-14">Section 14</a> "Security Considerations" of <a href="./rfc3558">RFC 3558</a>.
Public specification:
The EVRC vocoder is specified in 3GPP2 C.S0014. Transfer methods
are specified in <a href="./rfc3558">RFC 3558</a>.
Additional information: none
Intended usage:
COMMON. It is expected that many VoIP applications (as well as
mobile applications) will use this type.
<span class="grey">Li Standards Track [Page 16]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-17" ></span>
<span class="grey"><a href="./rfc3558">RFC 3558</a> RTP Payload Format for EVRC and SMV July 2003</span>
Person & email address to contact for further information:
Adam Li
adamli@icsl.ucla.edu
Author/Change controller:
Adam Li
adamli@icsl.ucla.edu
IETF Audio/Video Transport Working Group
<span class="h3"><a class="selflink" id="section-12.3" href="#section-12.3">12.3</a>. Registration of Media Type SMV</span>
Media Type Name: audio
Media Subtype Name: SMV
Required Parameter: none
Optional parameters:
The following parameters apply to RTP transfer only.
ptime: Defined as usual for RTP audio (see <a href="./rfc2327">RFC 2327</a>).
maxptime: The maximum amount of media which can be encapsulated
in each packet, expressed as time in milliseconds. The time
SHALL be calculated as the sum of the time the media present
in the packet represents. The time SHOULD be a multiple of the
duration of a single codec data frame (20 msec). If not
signaled, the default maxptime value SHALL be 200
milliseconds.
maxinterleave: Maximum number for interleaving length (field LLL
in the Interleaving Octet). The interleaving lengths used in
the entire session MUST NOT exceed this maximum value. If not
signaled, the maxinterleave length SHALL be 5.
Encoding considerations:
This type is defined for transfer of SMV-encoded data via RTP
using the Interleaved/Bundled packet format specified in <a href="#section-4.1">Section</a>
<a href="#section-4.1">4.1</a>, 6, and 7 of <a href="./rfc3558">RFC 3558</a>. It is also defined for other transfer
methods using the storage format specified in Section 11 of <a href="./rfc3558">RFC</a>
<a href="./rfc3558">3558</a>.
Security considerations:
See <a href="#section-14">Section 14</a> "Security Considerations" of <a href="./rfc3558">RFC 3558</a>.
Public specification:
The SMV vocoder is specified in 3GPP2 C.S0030-0 v2.0.
Transfer methods are specified in <a href="./rfc3558">RFC 3558</a>.
<span class="grey">Li Standards Track [Page 17]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-18" ></span>
<span class="grey"><a href="./rfc3558">RFC 3558</a> RTP Payload Format for EVRC and SMV July 2003</span>
Additional information:
The following information applies to storage format only.
Magic number: #!SMV\n (see <a href="./rfc3558#section-11">Section 11 of RFC 3558</a>)
File extensions: smv, SMV
Macintosh file type code: none
Object identifier or OID: none
Intended usage:
COMMON. It is expected that many VoIP applications (as well as
mobile applications) will use this type.
Person & email address to contact for further information:
Adam Li
adamli@icsl.ucla.edu
Author/Change controller:
Adam Li
adamli@icsl.ucla.edu
IETF Audio/Video Transport Working Group
<span class="h3"><a class="selflink" id="section-12.4" href="#section-12.4">12.4</a>. Registration of Media Type SMV0</span>
Media Type Name: audio
Media Subtype Name: SMV0
Required Parameter: none
Optional parameters: none
Encoding considerations: none
This type is only defined for transfer of SMV-encoded data via RTP
using the Header-Free packet format specified in <a href="./rfc3558#section-4.2">Section 4.2 of
RFC 3558</a>.
Security considerations:
See <a href="#section-14">Section 14</a> "Security Considerations" of <a href="./rfc3558">RFC 3558</a>.
Public specification:
The SMV vocoder is specified in 3GPP2 C.S0030-0 v2.0. Transfer
methods are specified in <a href="./rfc3558">RFC 3558</a>.
Additional information: none
Intended usage:
COMMON. It is expected that many VoIP applications (as well as
mobile applications) will use this type.
<span class="grey">Li Standards Track [Page 18]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-19" ></span>
<span class="grey"><a href="./rfc3558">RFC 3558</a> RTP Payload Format for EVRC and SMV July 2003</span>
Person & email address to contact for further information:
Adam Li
adamli@icsl.ucla.edu
Author/Change controller:
Adam Li
adamli@icsl.ucla.edu
IETF Audio/Video Transport Working Group
<span class="h2"><a class="selflink" id="section-13" href="#section-13">13</a>. Mapping to SDP Parameters</span>
Please note that this section applies to the RTP transfer only.
The information carried in the MIME media type specification has a
specific mapping to fields in the Session Description Protocol (SDP)
[<a href="#ref-6" title=""SDP: Session Description Protocol"">6</a>], which is commonly used to describe RTP sessions. When SDP is
used to specify sessions employing the EVRC or EMV codec, the mapping
is as follows:
o The MIME type ("audio") goes in SDP "m=" as the media name.
o The MIME subtype (payload format name) goes in SDP "a=rtpmap"
as the encoding name.
o The parameters "ptime" and "maxptime" go in the SDP "a=ptime"
and "a=maxptime" attributes, respectively.
o The parameter "maxinterleave" goes in the SDP "a=fmtp"
attribute by copying it directly from the MIME media type
string as "maxinterleave=value".
Some examples of SDP session descriptions for EVRC and SMV encodings
follow below.
Example of usage of EVRC:
m=audio 49120 RTP/AVP 97
a=rtpmap:97 EVRC/8000
a=fmtp:97 maxinterleave=2
a=maxptime:80
Example of usage of SMV
m=audio 49122 RTP/AVP 99
a=rtpmap:99 SMV0/8000
a=fmtp:99
<span class="grey">Li Standards Track [Page 19]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-20" ></span>
<span class="grey"><a href="./rfc3558">RFC 3558</a> RTP Payload Format for EVRC and SMV July 2003</span>
Note that the payload format (encoding) names are commonly shown in
upper case. MIME subtypes are commonly shown in lower case. These
names are case-insensitive in both places. Similarly, parameter
names are case-insensitive both in MIME types and in the default
mapping to the SDP a=fmtp attribute.
<span class="h2"><a class="selflink" id="section-14" href="#section-14">14</a>. Security Considerations</span>
RTP packets using the payload format defined in this specification
are subject to the security considerations discussed in the RTP
specification [<a href="#ref-4" title=""RTP: A Transport Protocol for Real-Time Applications"">4</a>], and any appropriate profile (for example [<a href="#ref-5" title=""RTP Profile for Audio and Video Conferences with Minimal Control"">5</a>]).
This implies that confidentiality of the media streams is achieved by
encryption. Because the data compression used with this payload
format is applied end-to-end, encryption may be performed after
compression so there is no conflict between the two operations.
A potential denial-of-service threat exists for data encoding using
compression techniques that have non-uniform receiver-end
computational load. The attacker can inject pathological datagrams
into the stream which are complex to decode and cause the receiver to
become overloaded. However, the encodings covered in this document
do not exhibit any significant non-uniformity.
As with any IP-based protocol, in some circumstances, a receiver may
be overloaded simply by the receipt of too many packets, either
desired or undesired. Network-layer authentication may be used to
discard packets from undesired sources, but the processing cost of
the authentication itself may be too high. In a multicast
environment, pruning of specific sources may be implemented in future
versions of IGMP [<a href="#ref-7" title=""Host Extensions for IP Multicasting"">7</a>] and in multicast routing protocols to allow a
receiver to select which sources are allowed to reach it.
Interleaving may affect encryption. Depending on the used encryption
scheme there may be restrictions on, for example, the time when keys
can be changed. Specifically, the key change may need to occur at
the boundary between interleave groups.
<span class="h2"><a class="selflink" id="section-15" href="#section-15">15</a>. Adding Support of Other Frame-Based Vocoders</span>
As described above, the RTP packet format defined in this document is
very flexible and designed to be usable by other frame-based
vocoders.
Additional vocoders using this format MUST have properties as
described in <a href="#section-3.3">Section 3.3</a>.
<span class="grey">Li Standards Track [Page 20]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-21" ></span>
<span class="grey"><a href="./rfc3558">RFC 3558</a> RTP Payload Format for EVRC and SMV July 2003</span>
For an eligible vocoder to use the payload format mechanisms defined
in this document, a new RTP payload format document needs to be
published as a standards track RFC. That document can simply refer
to this document and then specify the following parameters:
o Define the unit used for RTP time stamp;
o Define the meaning of the Mode Request bits;
o Define corresponding codec data frame type values for ToC;
o Define the conversion procedure for vocoders output data frame;
o Define a magic number for storage format, and complete the
corresponding MIME registration.
<span class="h2"><a class="selflink" id="section-16" href="#section-16">16</a>. Acknowledgements</span>
The following authors have made significant contributions to this
document: Adam H. Li, John D. Villasenor, Dong-Seek Park, Jeong-Hoon
Park, Keith Miller, S. Craig Greer, David Leon, Nikolai Leung,
Marcello Lioy, Kyle J. McKay, Magdalena L. Espelien, Randall Gellens,
Tom Hiller, Peter J. McCann, Stinson S. Mathai, Michael D. Turner,
Ajay Rajkumar, Dan Gal, Magnus Westerlund, Lars-Erik Jonsson, Greg
Sherwood, and Thomas Zeng.
<span class="h2"><a class="selflink" id="section-17" href="#section-17">17</a>. References</span>
<span class="h3"><a class="selflink" id="section-17.1" href="#section-17.1">17.1</a> Normative</span>
[<a id="ref-1">1</a>] 3GPP2 C.S0014, "Enhanced Variable Rate Codec, Speech Service
Option 3 for Wideband Spread Spectrum Digital Systems", January
1997.
[<a id="ref-2">2</a>] 3GPP2 C.S0030-0 v2.0, "Selectable Mode Vocoder, Service Option
for Wideband Spread Spectrum Communication Systems", May 2002.
[<a id="ref-3">3</a>] Bradner, S., "Key words for use in RFCs to Indicate Requirement
Levels", <a href="https://www.rfc-editor.org/bcp/bcp14">BCP 14</a>, <a href="./rfc2119">RFC 2119</a>, March 1997.
[<a id="ref-4">4</a>] Schulzrinne, H., Casner, S., Jacobson, V. and R. Frederick,
"RTP: A Transport Protocol for Real-Time Applications", <a href="./rfc3550">RFC</a>
<a href="./rfc3550">3550</a>, July 2003.
[<a id="ref-5">5</a>] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and Video
Conferences with Minimal Control", <a href="./rfc3551">RFC 3551</a>, July 2003.
[<a id="ref-6">6</a>] Handley, M. and V. Jacobson, "SDP: Session Description
Protocol", <a href="./rfc2327">RFC 2327</a>, April 1998.
<span class="grey">Li Standards Track [Page 21]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-22" ></span>
<span class="grey"><a href="./rfc3558">RFC 3558</a> RTP Payload Format for EVRC and SMV July 2003</span>
<span class="h3"><a class="selflink" id="section-17.2" href="#section-17.2">17.2</a> Informative</span>
[<a id="ref-7">7</a>] Deering, S., "Host Extensions for IP Multicasting", STD 5, <a href="./rfc1112">RFC</a>
<a href="./rfc1112">1112</a>, August 1989.
<span class="h2"><a class="selflink" id="section-18" href="#section-18">18</a>. Author's Address</span>
Adam H. Li
Image Communication Lab
Electrical Engineering Department
University of California
Los Angeles, CA 90095
USA
Phone: +1 310 825 5178
EMail: adamli@icsl.ucla.edu
<span class="grey">Li Standards Track [Page 22]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-23" ></span>
<span class="grey"><a href="./rfc3558">RFC 3558</a> RTP Payload Format for EVRC and SMV July 2003</span>
<span class="h2"><a class="selflink" id="section-19" href="#section-19">19</a>. Full Copyright Statement</span>
Copyright (C) The Internet Society (2003). All Rights Reserved.
This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.
The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Acknowledgement
Funding for the RFC Editor function is currently provided by the
Internet Society.
Li Standards Track [Page 23]
</pre>
|