1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683
|
Network Working Group H. Schulzrinne
Request for Comments: 2833 Columbia University
Category: Standards Track S. Petrack
MetaTel
May 2000
RTP Payload for DTMF Digits, Telephony Tones and Telephony Signals
Status of this Memo
This document specifies an Internet standards track protocol for the
Internet community, and requests discussion and suggestions for
improvements. Please refer to the current edition of the "Internet
Official Protocol Standards" (STD 1) for the standardization state
and status of this protocol. Distribution of this memo is unlimited.
Copyright Notice
Copyright (C) The Internet Society (2000). All Rights Reserved.
Abstract
This memo describes how to carry dual-tone multifrequency (DTMF)
signaling, other tone signals and telephony events in RTP packets.
1 Introduction
This memo defines two payload formats, one for carrying dual-tone
multifrequency (DTMF) digits, other line and trunk signals (Section
3), and a second one for general multi-frequency tones in RTP [1]
packets (Section 4). Separate RTP payload formats are desirable since
low-rate voice codecs cannot be guaranteed to reproduce these tone
signals accurately enough for automatic recognition. Defining
separate payload formats also permits higher redundancy while
maintaining a low bit rate.
The payload formats described here may be useful in at least three
applications: DTMF handling for gateways and end systems, as well as
"RTP trunks". In the first application, the Internet telephony
gateway detects DTMF on the incoming circuits and sends the RTP
payload described here instead of regular audio packets. The gateway
likely has the necessary digital signal processors and algorithms, as
it often needs to detect DTMF, e.g., for two-stage dialing. Having
the gateway detect tones relieves the receiving Internet end system
from having to do this work and also avoids that low bit-rate codecs
like G.723.1 render DTMF tones unintelligible. Secondly, an Internet
Schulzrinne & Petrack Standards Track [Page 1]
RFC 2833 Tones May 2000
end system such as an "Internet phone" can emulate DTMF functionality
without concerning itself with generating precise tone pairs and
without imposing the burden of tone recognition on the receiver.
In the "RTP trunk" application, RTP is used to replace a normal
circuit-switched trunk between two nodes. This is particularly of
interest in a telephone network that is still mostly circuit-
switched. In this case, each end of the RTP trunk encodes audio
channels into the appropriate encoding, such as G.723.1 or G.729.
However, this encoding process destroys in-band signaling information
which is carried using the least-significant bit ("robbed bit
signaling") and may also interfere with in-band signaling tones, such
as the MF digit tones. In addition, tone properties such as the phase
reversals in the ANSam tone, will not survive speech coding. Thus,
the gateway needs to remove the in-band signaling information from
the bit stream. It can now either carry it out-of-band in a signaling
transport mechanism yet to be defined, or it can use the mechanism
described in this memorandum. (If the two trunk end points are within
reach of the same media gateway controller, the media gateway
controller can also handle the signaling.) Carrying it in-band may
simplify the time synchronization between audio packets and the tone
or signal information. This is particularly relevant where duration
and timing matter, as in the carriage of DTMF signals.
1.1 Terminology
In this document, the key words "MUST", "MUST NOT", "REQUIRED",
"SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY",
and "OPTIONAL" are to be interpreted as described in RFC 2119 [2] and
indicate requirement levels for compliant implementations.
2 Events vs. Tones
A gateway has two options for handling DTMF digits and events. First,
it can simply measure the frequency components of the voice band
signals and transmit this information to the RTP receiver (Section
4). In this mode, the gateway makes no attempt to discern the meaning
of the tones, but simply distinguishes tones from speech signals.
All tone signals in use in the PSTN and meant for human consumption
are sequences of simple combinations of sine waves, either added or
modulated. (There is at least one tone, the ANSam tone [3] used for
indicating data transmission over voice lines, that makes use of
periodic phase reversals.)
As a second option, a gateway can recognize the tones and translate
them into a name, such as ringing or busy tone. The receiver then
produces a tone signal or other indication appropriate to the signal.
Schulzrinne & Petrack Standards Track [Page 2]
RFC 2833 Tones May 2000
Generally, since the recognition of signals often depends on their
on/off pattern or the sequence of several tones, this recognition can
take several seconds. On the other hand, the gateway may have access
to the actual signaling information that generates the tones and thus
can generate the RTP packet immediately, without the detour through
acoustic signals.
In the phone network, tones are generated at different places,
depending on the switching technology and the nature of the tone.
This determines, for example, whether a person making a call to a
foreign country hears her local tones she is familiar with or the
tones as used in the country called.
For analog lines, dial tone is always generated by the local switch.
ISDN terminals may generate dial tone locally and then send a Q.931
SETUP message containing the dialed digits. If the terminal just
sends a SETUP message without any Called Party digits, then the
switch does digit collection, provided by the terminal as KEYPAD
messages, and provides dial tone over the B-channel. The terminal can
either use the audio signal on the B-channel or can use the Q.931
messages to trigger locally generated dial tone.
Ringing tone (also called ringback tone) is generated by the local
switch at the callee, with a one-way voice path opened up as soon as
the callee's phone rings. (This reduces the chance of clipping the
called party's response just after answer. It also permits pre-answer
announcements or in-band call-progress indications to reach the
caller before or in lieu of a ringing tone.) Congestion tone and
special information tones can be generated by any of the switches
along the way, and may be generated by the caller's switch based on
ISUP messages received. Busy tone is generated by the caller's
switch, triggered by the appropriate ISUP message, for analog
instruments, or the ISDN terminal.
Gateways which send signaling events via RTP MAY send both named
signals (Section 3) and the tone representation (Section 4) as a
single RTP session, using the redundancy mechanism defined in Section
3.7 to interleave the two representations. It is generally a good
idea to send both, since it allows the receiver to choose the
appropriate rendering.
If a gateway cannot present a tone representation, it SHOULD send the
audio tones as regular RTP audio packets (e.g., as payload format
PCMU), in addition to the named signals.
Schulzrinne & Petrack Standards Track [Page 3]
RFC 2833 Tones May 2000
3 RTP Payload Format for Named Telephone Events
3.1 Introduction
The payload format for named telephone events described below is
suitable for both gateway and end-to-end scenarios. In the gateway
scenario, an Internet telephony gateway connecting a packet voice
network to the PSTN recreates the DTMF tones or other telephony
events and injects them into the PSTN. Since, for example, DTMF digit
recognition takes several tens of milliseconds, the first few
milliseconds of a digit will arrive as regular audio packets. Thus,
careful time and power (volume) alignment between the audio samples
and the events is needed to avoid generating spurious digits at the
receiver.
DTMF digits and named telephone events are carried as part of the
audio stream, and MUST use the same sequence number and time-stamp
base as the regular audio channel to simplify the generation of audio
waveforms at a gateway. The default clock frequency is 8,000 Hz, but
the clock frequency can be redefined when assigning the dynamic
payload type.
The payload format described here achieves a higher redundancy even
in the case of sustained packet loss than the method proposed for the
Voice over Frame Relay Implementation Agreement [4].
If an end system is directly connected to the Internet and does not
need to generate tone signals again, time alignment and power levels
are not relevant. These systems rely on PSTN gateways or Internet end
systems to generate DTMF events and do not perform their own audio
waveform analysis. An example of such a system is an Internet
interactive voice-response (IVR) system.
In circumstances where exact timing alignment between the audio
stream and the DTMF digits or other events is not important and data
is sent unicast, such as the IVR example mentioned earlier, it may be
preferable to use a reliable control protocol rather than RTP
packets. In those circumstances, this payload format would not be
used.
3.2 Simultaneous Generation of Audio and Events
A source MAY send events and coded audio packets for the same time
instants, using events as the redundant encoding for the audio
stream, or it MAY block outgoing audio while event tones are active
and only send named events as both the primary and redundant
encodings.
Schulzrinne & Petrack Standards Track [Page 4]
RFC 2833 Tones May 2000
Note that a period covered by an encoded tone may overlap in time
with a period of audio encoded by other means. This is likely to
occur at the onset of a tone and is necessary to avoid possible
errors in the interpretation of the reproduced tone at the remote
end. Implementations supporting this payload format must be prepared
to handle the overlap. It is RECOMMENDED that gateways only render
the encoded tone since the audio may contain spurious tones
introduced by the audio compression algorithm. However, it is
anticipated that these extra tones in general should not interfere
with recognition at the far end.
3.3 Event Types
This payload format is used for five different types of signals:
o DTMF tones (Section 3.10);
o fax-related tones (Section 3.11);
o standard subscriber line tones (Section 3.12);
o country-specific subscriber line tones (Section 3.13) and;
o trunk events (Section 3.14).
A compliant implementation MUST support the events listed in Table 1
with the exception of "flash". If it uses some other, out-of-band
mechanism for signaling line conditions, it does not have to
implement the other events.
In some cases, an implementation may simply ignore certain events,
such as fax tones, that do not make sense in a particular
environment. Section 3.9 specifies how an implementation can use the
SDP "fmtp" parameter within an SDP description to indicate its
inability to understand a particular event or range of events.
Depending on the available user interfaces, an implementation MAY
render all tones in Table 5 the same or, preferably, use the tones
conveyed by the concurrent "tone" payload or other RTP audio payload.
Alternatively, it could provide a textual representation.
Note that end systems that emulate telephones only need to support
the events described in Sections 3.10 and 3.12, while systems that
receive trunk signaling need to implement those in Sections 3.10,
3.11, 3.12 and 3.14, since MF trunks also carry most of the "line"
signals. Systems that do not support fax or modem functionality do
not need to render fax-related events described in Section 3.11.
Schulzrinne & Petrack Standards Track [Page 5]
RFC 2833 Tones May 2000
The RTP payload format is designated as "telephone-event", the MIME
type as "audio/telephone-event". The default timestamp rate is 8000
Hz, but other rates may be defined. In accordance with current
practice, this payload format does not have a static payload type
number, but uses a RTP payload type number established dynamically
and out-of-band.
3.4 Use of RTP Header Fields
Timestamp: The RTP timestamp reflects the measurement point for
the current packet. The event duration described in Section
3.5 extends forwards from that time. The receiver calculates
jitter for RTCP receiver reports based on all packets with a
given timestamp. Note: The jitter value should primarily be
used as a means for comparing the reception quality between
two users or two time-periods, not as an absolute measure.
Marker bit: The RTP marker bit indicates the beginning of a new
event.
3.5 Payload Format
The payload format is shown in Fig. 1.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| event |E|R| volume | duration |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 1: Payload Format for Named Events
events: The events are encoded as shown in Sections 3.10 through
3.14.
volume: For DTMF digits and other events representable as tones,
this field describes the power level of the tone, expressed
in dBm0 after dropping the sign. Power levels range from 0 to
-63 dBm0. The range of valid DTMF is from 0 to -36 dBm0 (must
accept); lower than -55 dBm0 must be rejected (TR-TSY-000181,
ITU-T Q.24A). Thus, larger values denote lower volume. This
value is defined only for DTMF digits. For other events, it
is set to zero by the sender and is ignored by the receiver.
Schulzrinne & Petrack Standards Track [Page 6]
RFC 2833 Tones May 2000
duration: Duration of this digit, in timestamp units. Thus, the
event began at the instant identified by the RTP timestamp
and has so far lasted as long as indicated by this parameter.
The event may or may not have ended.
For a sampling rate of 8000 Hz, this field is sufficient to
express event durations of up to approximately 8 seconds.
E: If set to a value of one, the "end" bit indicates that this
packet contains the end of the event. Thus, the duration
parameter above measures the complete duration of the event.
A sender MAY delay setting the end bit until retransmitting
the last packet for a tone, rather than on its first
transmission. This avoids having to wait to detect whether
the tone has indeed ended.
Receiver implementations MAY use different algorithms to
create tones, including the two described here. In the first,
the receiver simply places a tone of the given duration in
the audio playout buffer at the location indicated by the
timestamp. As additional packets are received that extend the
same tone, the waveform in the playout buffer is extended
accordingly. (Care has to be taken if audio is mixed, i.e.,
summed, in the playout buffer rather than simply copied.)
Thus, if a packet in a tone lasting longer than the packet
interarrival time gets lost and the playout delay is short, a
gap in the tone may occur. Alternatively, the receiver can
start a tone and play it until it receives a packet with the
"E" bit set, the next tone, distinguished by a different
timestamp value or a given time period elapses. This is more
robust against packet loss, but may extend the tone if all
retransmissions of the last packet in an event are lost.
Limiting the time period of extending the tone is necessary
to avoid that a tone "gets stuck". Regardless of the
algorithm used, the tone SHOULD NOT be extended by more than
three packet interarrival times. A slight extension of tone
durations and shortening of pauses is generally harmless.
R: This field is reserved for future use. The sender MUST set it
to zero, the receiver MUST ignore it.
Schulzrinne & Petrack Standards Track [Page 7]
RFC 2833 Tones May 2000
3.6 Sending Event Packets
An audio source SHOULD start transmitting event packets as soon as it
recognizes an event and every 50 ms thereafter or the packet interval
for the audio codec used for this session, if known. (The sender does
not need to maintain precise time intervals between event packets in
order to maintain precise inter-event times, since the timing
information is contained in the timestamp.)
Q.24 [5], Table A-1, indicates that all administrations surveyed
use a minimum signal duration of 40 ms, with signaling velocity
(tone and pause) of no less than 93 ms.
If an event continues for more than one period, the source generating
the events should send a new event packet with the RTP timestamp
value corresponding to the beginning of the event and the duration of
the event increased correspondingly. (The RTP sequence number is
incremented by one for each packet.) If there has been no new event
in the last interval, the event SHOULD be retransmitted three times
or until the next event is recognized. This ensures that the duration
of the event can be recognized correctly even if the last packet for
an event is lost.
DTMF digits and events are sent incrementally to avoid having the
receiver wait for the completion of the event. Since some tones
are two seconds long, this would incur a substantial delay. The
transmitter does not know if event length is important and thus
needs to transmit immediately and incrementally. If the receiver
application does not care about event length, the incremental
transmission mechanism avoids delay. Some applications, such as
gateways into the PSTN, care about both delays and event duration.
3.7 Reliability
During an event, the RTP event payload format provides incremental
updates on the event. The error resiliency depends on the playout
delay at the receiver. For example, for a playout delay of 120 ms and
a packet gap of 50 ms, two packets in a row can get lost without
causing a gap in the tones generated at the receiver.
The audio redundancy mechanism described in RFC 2198 [6] MAY be used
to recover from packet loss across events. The effective data rate is
r times 64 bits (32 bits for the redundancy header and 32 bits for
the telephone-event payload) every 50 ms or r times 1280 bits/second,
where r is the number of redundant events carried in each packet. The
value of r is an implementation trade-off, with a value of 5
suggested.
Schulzrinne & Petrack Standards Track [Page 8]
RFC 2833 Tones May 2000
The timestamp offset in this redundancy scheme has 14 bits, so
that it allows a single packet to "cover" 2.048 seconds of
telephone events at a sampling rate of 8000 Hz. Including the
starting time of previous events allows precise reconstruction of
the tone sequence at a gateway. The scheme is resilient to
consecutive packet losses spanning this interval of 2.048 seconds
or r digits, whichever is less. Note that for previous digits,
only an average loudness can be represented.
An encoder MAY treat the event payload as a highly-compressed version
of the current audio frame. In that mode, each RTP packet during an
event would contain the current audio codec rendition (say, G.723.1
or G.729) of this digit as well as the representation described in
Section 3.5, plus any previous events seen earlier.
This approach allows dumb gateways that do not understand this
format to function. See also the discussion in Section 1.
3.8 Example
A typical RTP packet, where the user is just dialing the last digit
of the DTMF sequence "911". The first digit was 200 ms long (1600
timestamp units) and started at time 0, the second digit lasted 250
ms (2000 timestamp units) and started at time 800 ms (6400 timestamp
units), the third digit was pressed at time 1.4 s (11,200 timestamp
units) and the packet shown was sent at 1.45 s (11,600 timestamp
units). The frame duration is 50 ms. To make the parts recognizable,
the figure below ignores byte alignment. Timestamp and sequence
number are assumed to have been zero at the beginning of the first
digit. In this example, the dynamic payload types 96 and 97 have been
assigned for the redundancy mechanism and the telephone event
payload, respectively.
Schulzrinne & Petrack Standards Track [Page 9]
RFC 2833 Tones May 2000
3.9 Indication of Receiver Capabilities using SDP
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X| CC |M| PT | sequence number |
| 2 |0|0| 0 |0| 96 | 28 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| timestamp |
| 11200 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| synchronization source (SSRC) identifier |
| 0x5234a8 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F| block PT | timestamp offset | block length |
|1| 97 | 11200 | 4 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F| block PT | timestamp offset | block length |
|1| 97 | 11200 - 6400 = 4800 | 4 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F| Block PT |
|0| 97 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| digit |E R| volume | duration |
| 9 |1 0| 7 | 1600 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| digit |E R| volume | duration |
| 1 |1 0| 10 | 2000 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| digit |E R| volume | duration |
| 1 |0 0| 20 | 400 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 2: Example RTP packet after dialing "911"
Receivers MAY indicate which named events they can handle, for
example, by using the Session Description Protocol (RFC 2327 [7]).
The payload formats use the following fmtp format to list the event
values that they can receive:
a=fmtp:<format> <list of values>
The list of values consists of comma-separated elements, which can be
either a single decimal number or two decimal numbers separated by a
hyphen (dash), where the second number is larger than the first. No
whitespace is allowed between numbers or hyphens. The list does not
have to be sorted.
Schulzrinne & Petrack Standards Track [Page 10]
RFC 2833 Tones May 2000
For example, if the payload format uses the payload type number 100,
and the implementation can handle the DTMF tones (events 0 through
15) and the dial and ringing tones, it would include the following
description in its SDP message:
a=fmtp:100 0-15,66,70
Since all implementations MUST be able to receive events 0 through
15, listing these events in the a=fmtp line is OPTIONAL.
The corresponding MIME parameter is "events", so that the following
sample media type definition corresponds to the SDP example above:
audio/telephone-event;events="0-11,66,67";rate="8000"
3.10 DTMF Events
Table 1 summarizes the DTMF-related named events within the
telephone-event payload format.
Event encoding (decimal)
_________________________
0--9 0--9
* 10
# 11
A--D 12--15
Flash 16
Table 1: DTMF named events
3.11 Data Modem and Fax Events
Table 3.11 summarizes the events and tones that can appear on a
subscriber line serving a fax machine or modem. The tones are
described below, with additional detail in Table 7.
ANS: This 2100 +/- 15 Hz tone is used to disable echo
suppression for data transmission [8,9]. For fax machines,
Recommendation T.30 [9] refers to this tone as called
terminal identification (CED) answer tone.
/ANS: This is the same signal as ANS, except that it reverses
phase at an interval of 450 +/- 25 ms. It disables both
echo cancellers and echo suppressors. (In the ITU
Recommendation V.25 [8], this signal is rendered as ANS
with a bar on top.)
Schulzrinne & Petrack Standards Track [Page 11]
RFC 2833 Tones May 2000
ANSam: The modified answer tone (ANSam) [3] is a sinewave signal
at 2100 +/- 1 Hz without phase reversals, amplitude-modulated
by a sinewave at 15 +/- 0.1 Hz. This tone is sent by modems
if network echo canceller disabling is not required.
/ANSam: The modified answer tone with phase reversals (ANSam) [3]
is a sinewave signal at 2100 +/- 1 Hz with phase reversals at
intervals of 450 +/- 25 ms, amplitude-modulated by a sinewave
at 15 +/- 0.1 Hz. This tone [10,8] is sent by modems [11] and
faxes to disable echo suppressors.
CNG: After dialing the called fax machine's telephone number (and
before it answers), the calling Group III fax machine
(optionally) begins sending a CalliNG tone (CNG) consisting
of an interrupted tone of 1100 Hz. [9]
CRdi: Capabilities Request (CRd), initiating side, [12] is a
dual-tone signal with tones at 1375 Hz and 2002 Hz for 400
ms, followed by a single tone at 1900 Hz for 100 ms. "This
signal requests the remote station transition from telephony
mode to an information transfer mode and requests the
transmission of a capabilities list message by the remote
station. In particular, CRdi is sent by the initiating
station during the course of a call, or by the calling
station at call establishment in response to a CRe or MRe."
CRdr: CRdr is the response tone to CRdi (see above). It consists
of a dual-tone signal with tones at 1529 Hz and 2225 Hz for
400 ms, followed by a single tone at 1900 Hz for 100 ms.
CRe: Capabilities Request (CRe) [12] is a dual-tone signal with
tones at tones at 1375 Hz and 2002 Hz for 400 ms, followed by
a single tone at 400 Hz for 100 ms. "This signal requests the
remote station transition from telephony mode to an
information transfer mode and requests the transmission of a
capabilities list message by the remote station. In
particular, CRe is sent by an automatic answering station at
call establishment."
CT: "The calling tone [8] consists of a series of interrupted
bursts of binary 1 signal or 1300 Hz, on for a duration of
not less than 0.5 s and not more than 0.7 s and off for a
duration of not less than 1.5 s and not more than 2.0 s."
Modems not starting with the V.8 call initiation tone often
use this tone.
Schulzrinne & Petrack Standards Track [Page 12]
RFC 2833 Tones May 2000
ESi: Escape Signal (ESi) [12] is a dual-tone signal with tones at
1375 Hz and 2002 Hz for 400 ms, followed by a single tone at
980 Hz for 100 ms. "This signal requests the remote station
transition from telephony mode to an information transfer
mode. signal ESi is sent by the initiating station."
ESr: Escape Signal (ESr) [12] is a dual-tone signal with tones at
1529 Hz and 2225 Hz for 400 ms, followed by a single tone at
1650 Hz for 100 ms. Same as ESi, but sent by the responding
station.
MRdi: Mode Request (MRd), initiating side, [12] is a dual-tone
signal with tones at 1375 Hz and 2002 Hz for 400 ms followed
by a single tone at 1150 Hz for 100 ms. "This signal requests
the remote station transition from telephony mode to an
information transfer mode and requests the transmission of a
mode select message by the remote station. In particular,
signal MRd is sent by the initiating station during the
course of a call, or by the calling station at call
establishment in response to an MRe." [12]
MRdr: MRdr is the response tone to MRdi (see above). It consists
of a dual-tone signal with tones at 1529 Hz and 2225 Hz for
400 ms, followed by a single tone at 1150 Hz for 100 ms.
MRe: Mode Request (MRe) [12] is a dual-tone signal with tones at
1375 Hz and 2002 Hz for 400 ms, followed by a single tone at
650 Hz for 100 ms. "This signal requests the remote station
transition from telephony mode to an information transfer
mode and requests the transmission of a mode select message
by the remote station. In particular, signal MRe is sent by
an automatic answering station at call establishment." [12]
V.21: V.21 describes a 300 b/s full-duplex modem that employs
frequency shift keying (FSK). It is used by Group 3 fax
machines to exchange T.30 information. The calling transmits
on channel 1 and receives on channel 2; the answering modem
transmits on channel 2 and receives on channel 1. Each bit
value has a distinct tone, so that V.21 signaling comprises a
total of four distinct tones.
Schulzrinne & Petrack Standards Track [Page 13]
RFC 2833 Tones May 2000
In summary, procedures in Table 2 are used.
Procedure indications
___________________________________________________
V.25 and V.8 ANS
V.25, echo canceller disabled ANS, /ANS, ANS, /ANS
V.8 ANSam
V.8, echo canceller disabled /ANSam
Table 2: Use of ANS, ANSam and /ANSam in V.x recommendations
Event encoding (decimal)
___________________________________________________
Answer tone (ANS) 32
/ANS 33
ANSam 34
/ANSam 35
Calling tone (CNG) 36
V.21 channel 1, "0" bit 37
V.21 channel 1, "1" bit 38
V.21 channel 2, "0" bit 39
V.21 channel 2, "1" bit 40
CRdi 41
CRdr 42
CRe 43
ESi 44
ESr 45
MRdi 46
MRdr 47
MRe 48
CT 49
Table 3: Data and fax named events
3.12 Line Events
Table 4 summarizes the events and tones that can appear on a
subscriber line.
ITU Recommendation E.182 [13] defines when certain tones should be
used. It defines the following standard tones that are heard by the
caller:
Dial tone: The exchange is ready to receive address information.
Schulzrinne & Petrack Standards Track [Page 14]
RFC 2833 Tones May 2000
PABX internal dial tone: The PABX is ready to receive address
information.
Special dial tone: Same as dial tone, but the caller's line is
subject to a specific condition, such as call diversion or a
voice mail is available (e.g., "stutter dial tone").
Second dial tone: The network has accepted the address
information, but additional information is required.
Ring: This named signal event causes the recipient to generate an
alerting signal ("ring"). The actual tone or other indication
used to render this named event is left up to the receiver.
(This differs from the ringing tone, below, heard by the
caller
Ringing tone: The call has been placed to the callee and a calling
signal (ringing) is being transmitted to the callee. This
tone is also called "ringback".
Special ringing tone: A special service, such as call forwarding
or call waiting, is active at the called number.
Busy tone: The called telephone number is busy.
Congestion tone: Facilities necessary for the call are temporarily
unavailable.
Calling card service tone: The calling card service tone consists
of 60 ms of the sum of 941 Hz and 1477 Hz tones (DTMF '#'),
followed by 940 ms of 350 Hz and 440 Hz (U.S. dial tone),
decaying exponentially with a time constant of 200 ms.
Special information tone: The callee cannot be reached, but the
reason is neither "busy" nor "congestion". This tone should
be used before all call failure announcements, for the
benefit of automatic equipment.
Comfort tone: The call is being processed. This tone may be used
during long post-dial delays, e.g., in international
connections.
Hold tone: The caller has been placed on hold.
Record tone: The caller has been connected to an automatic
answering device and is requested to begin speaking.
Schulzrinne & Petrack Standards Track [Page 15]
RFC 2833 Tones May 2000
Caller waiting tone: The called station is busy, but has call
waiting service.
Pay tone: The caller, at a payphone, is reminded to deposit
additional coins.
Positive indication tone: The supplementary service has been
activated.
Negative indication tone: The supplementary service could not be
activated.
Off-hook warning tone: The caller has left the instrument off-hook
for an extended period of time.
The following tones can be heard by either calling or called party
during a conversation:
Call waiting tone: Another party wants to reach the subscriber.
Warning tone: The call is being recorded. This tone is not
required in all jurisdictions.
Intrusion tone: The call is being monitored, e.g., by an operator.
CPE alerting signal: A tone used to alert a device to an arriving
in-band FSK data transmission. A CPE alerting signal is a
combined 2130 and 2750 Hz tone, both with tolerances of 0.5%
and a duration of 80 to. 80 ms. The CPE alerting signal is
used with ADSI services and Call Waiting ID services [14].
The following tones are heard by operators:
Payphone recognition tone: The person making the call or being
called is using a payphone (and thus it is ill-advised to
allow collect calls to such a person).
Schulzrinne & Petrack Standards Track [Page 16]
RFC 2833 Tones May 2000
Event encoding (decimal)
_____________________________________________
Off Hook 64
On Hook 65
Dial tone 66
PABX internal dial tone 67
Special dial tone 68
Second dial tone 69
Ringing tone 70
Special ringing tone 71
Busy tone 72
Congestion tone 73
Special information tone 74
Comfort tone 75
Hold tone 76
Record tone 77
Caller waiting tone 78
Call waiting tone 79
Pay tone 80
Positive indication tone 81
Negative indication tone 82
Warning tone 83
Intrusion tone 84
Calling card service tone 85
Payphone recognition tone 86
CPE alerting signal (CAS) 87
Off-hook warning tone 88
Ring 89
Table 4: E.182 line events
3.13 Extended Line Events
Table 5 summarizes country-specific events and tones that can appear
on a subscriber line.
3.14 Trunk Events
Table 6 summarizes the events and tones that can appear on a trunk.
Note that trunk can also carry line events (Section 3.12), as MF
signaling does not include backward signals [15].
ABCD transitional: 4-bit signaling used by digital trunks. For N-
state signaling, the first N values are used.
Schulzrinne & Petrack Standards Track [Page 17]
RFC 2833 Tones May 2000
Event encoding (decimal)
___________________________________________________
Acceptance tone 96
Confirmation tone 97
Dial tone, recall 98
End of three party service tone 99
Facilities tone 100
Line lockout tone 101
Number unobtainable tone 102
Offering tone 103
Permanent signal tone 104
Preemption tone 105
Queue tone 106
Refusal tone 107
Route tone 108
Valid tone 109
Waiting tone 110
Warning tone (end of period) 111
Warning Tone (PIP tone) 112
Table 5: Country-specific Line events
The T1 ESF (extended super frame format) allows 2, 4, and 16
state signaling bit options. These signaling bits are named
A, B, C, and D. Signaling information is sent as robbed bits
in frames 6, 12, 18, and 24 when using ESF T1 framing. A D4
superframe only transmits 4-state signaling with A and B
bits. On the CEPT E1 frame, all signaling is carried in
timeslot 16, and two channels of 16-state (ABCD) signaling
are sent per frame.
Since this information is a state rather than a changing
signal, implementations SHOULD use the following triple-
redundancy mechanism, similar to the one specified in ITU-T
Rec. I.366.2 [16], Annex L. At the time of a transition, the
same ABCD information is sent 3 times at an interval of 5 ms.
If another transition occurs during this time, then this
continues. After a period of no change, the ABCD information
is sent every 5 seconds.
Wink: A brief transition, typically 120-290 ms, from on-hook
(unseized) to off-hook (seized) and back to onhook, used by
the incoming exchange to signal that the call address
signaling can proceed.
Incoming seizure: Incoming indication of call attempt (off-hook).
Schulzrinne & Petrack Standards Track [Page 18]
RFC 2833 Tones May 2000
Event encoding (decimal)
__________________________________________________
MF 0... 9 128...137
MF K0 or KP (start-of-pulsing) 138
MF K1 139
MF K2 140
MF S0 to ST (end-of-pulsing) 141
MF S1... S3 142...143
ABCD signaling (see below) 144...159
Wink 160
Wink off 161
Incoming seizure 162
Seizure 163
Unseize circuit 164
Continuity test 165
Default continuity tone 166
Continuity tone (single tone) 167
Continuity test send 168
Continuity verified 170
Loopback 171
Old milliwatt tone (1000 Hz) 172
New milliwatt tone (1004 Hz) 173
Table 6: Trunk events
Seizure: Seizure by answering exchange, in response to outgoing
seizure.
Unseize circuit: Transition of circuit from off-hook to on-hook at
the end of a call.
Wink off: A brief transition, typically 100-350 ms, from off-hook
(seized) to on-hook (unseized) and back to off-hook (seized).
Used in operator services trunks.
Continuity tone send: A tone of 2010 Hz.
Continuity tone detect: A tone of 2010 Hz.
Continuity test send: A tone of 1780 Hz is sent by the calling
exchange. If received by the called exchange, it returns a
"continuity verified" tone.
Continuity verified: A tone of 2010 Hz. This is a response tone,
used in dual-tone procedures.
Schulzrinne & Petrack Standards Track [Page 19]
RFC 2833 Tones May 2000
4 RTP Payload Format for Telephony Tones
4.1 Introduction
As an alternative to describing tones and events by name, as
described in Section 3, it is sometimes preferable to describe them
by their waveform properties. In particular, recognition is faster
than for naming signals since it does not depend on recognizing
durations or pauses.
There is no single international standard for telephone tones such as
dial tone, ringing (ringback), busy, congestion ("fast-busy"),
special announcement tones or some of the other special tones, such
as payphone recognition, call waiting or record tone. However, across
all countries, these tones share a number of characteristics [17]:
o Telephony tones consist of either a single tone, the addition
of two or three tones or the modulation of two tones. (Almost
all tones use two frequencies; only the Hungarian "special dial
tone" has three.) Tones that are mixed have the same amplitude
and do not decay.
o Tones for telephony events are in the range of 25 (ringing tone
in Angola) to 1800 Hz. CED is the highest used tone at 2100 Hz.
The telephone frequency range is limited to 3,400 Hz. (The
piano has a range from 27.5 to 4186 Hz.)
o Modulation frequencies range between 15 (ANSam tone) to 480 Hz
(Jamaica). Non-integer frequencies are used only for
frequencies of 16 2/3 and 33 1/3 Hz. (These fractional
frequencies appear to be derived from older AC power grid
frequencies.)
o Tones that are not continuous have durations of less than four
seconds.
o ITU Recommendation E.180 [18] notes that different telephone
companies require a tone accuracy of between 0.5 and 1.5%. The
Recommendation suggests a frequency tolerance of 1%.
4.2 Examples of Common Telephone Tone Signals
As an aid to the implementor, Table 7 summarizes some common tones.
The rows labeled "ITU ..." refer to the general recommendation of
Recommendation E.180 [18]. Note that there are no specific guidelines
for these tones. In the table, the symbol "+" indicates addition of
Schulzrinne & Petrack Standards Track [Page 20]
RFC 2833 Tones May 2000
the tones, without modulation, while "*" indicates amplitude
modulation. The meaning of some of the tones is described in Section
3.12 or Section 3.11 (for V.21).
Tone name frequency on period off period
______________________________________________________
CNG 1100 0.5 3.0
V.25 CT 1300 0.5 2.0
CED 2100 3.3 --
ANS 2100 3.3 --
ANSam 2100*15 3.3 --
V.21 "0" bit, ch. 1 1180 0.00333
V.21 "1" bit, ch. 1 980 0.00333
V.21 "0" bit, ch. 2 1850 0.00333
V.21 "1" bit, ch. 2 1650 0.00333
ITU dial tone 425 -- --
U.S. dial tone 350+440 -- --
______________________________________________________
ITU ringing tone 425 0.67--1.5 3--5
U.S. ringing tone 440+480 2.0 4.0
ITU busy tone 425
U.S. busy tone 480+620 0.5 0.5
______________________________________________________
ITU congestion tone 425
U.S. congestion tone 480+620 0.25 0.25
Table 7: Examples of telephony tones
4.3 Use of RTP Header Fields
Timestamp: The RTP timestamp reflects the measurement point for
the current packet. The event duration described in Section
3.5 extends forwards from that time.
4.4 Payload Format
Based on the characteristics described above, this document defines
an RTP payload format called "tone" that can represent tones
consisting of one or more frequencies. (The corresponding MIME type
is "audio/tone".) The default timestamp rate is 8,000 Hz, but other
rates may be defined. Note that the timestamp rate does not affect
the interpretation of the frequency, just the durations.
In accordance with current practice, this payload format does not
have a static payload type number, but uses a RTP payload type number
established dynamically and out-of-band.
It is shown in Fig. 3.
Schulzrinne & Petrack Standards Track [Page 21]
RFC 2833 Tones May 2000
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| modulation |T| volume | duration |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|R R R R| frequency |R R R R| frequency |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|R R R R| frequency |R R R R| frequency |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
......
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|R R R R| frequency |R R R R| frequency |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 3: Payload format for tones
The payload contains the following fields:
modulation: The modulation frequency, in Hz. The field is a 9-bit
unsigned integer, allowing modulation frequencies up to 511
Hz. If there is no modulation, this field has a value of
zero.
T: If the "T" bit is set (one), the modulation frequency is to be
divided by three. Otherwise, the modulation frequency is
taken as is.
This bit allows frequencies accurate to 1/3 Hz, since
modulation frequencies such as 16 2/3 Hz are in practical
use.
volume: The power level of the tone, expressed in dBm0 after
dropping the sign, with range from 0 to -63 dBm0. (Note: A
preferred level range for digital tone generators is -8 dBm0
to -3 dBm0.)
duration: The duration of the tone, measured in timestamp units.
The tone begins at the instant identified by the RTP
timestamp and lasts for the duration value.
The definition of duration corresponds to that for sample-
based codecs, where the timestamp represents the sampling
point for the first sample.
frequency: The frequencies of the tones to be added, measured in
Hz and represented as a 12-bit unsigned integer. The field
size is sufficient to represent frequencies up to 4095 Hz,
Schulzrinne & Petrack Standards Track [Page 22]
RFC 2833 Tones May 2000
which exceeds the range of telephone systems. A value of zero
indicates silence. A single tone can contain any number of
frequencies.
R: This field is reserved for future use. The sender MUST set it
to zero, the receiver MUST ignore it.
4.5 Reliability
This payload format uses the reliability mechanism described in
Section 3.7.
5 Combining Tones and Named Events
The payload formats in Sections 3 and 4 can be combined into a single
payload using the method specified in RFC 2198. Fig. 4 shows an
example. In that example, the RTP packet combines two "tone" and one
"telephone-event" payloads. The payload types are chosen arbitrarily
as 97 and 98, respectively, with a sample rate of 8000 Hz. Here, the
redundancy format has the dynamic payload type 96.
The packet represents a snapshot of U.S. ringing tone, 1.5 seconds
(12,000 timestamp units) into the second "on" part of the 2.0/4.0
second cadence, i.e., a total of 7.5 seconds (60,000 timestamp units)
into the ring cycle. The 440 + 480 Hz tone of this second cadence
started at RTP timestamp 48,000. Four seconds of silence preceded it,
but since RFC 2198 only has a fourteen-bit offset, only 2.05 seconds
(16383 timestamp units) can be represented. Even though the tone
sequence is not complete, the sender was able to determine that this
is indeed ringback, and thus includes the corresponding named event.
6 MIME Registration
6.1 audio/telephone-event
MIME media type name: audio
MIME subtype name: telephone-event
Required parameters: none.
Schulzrinne & Petrack Standards Track [Page 23]
RFC 2833 Tones May 2000
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| V |P|X| CC |M| PT | sequence number |
| 2 |0|0| 0 |0| 96 | 31 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| timestamp |
| 48000 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| synchronization source (SSRC) identifier |
| 0x5234a8 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F| block PT | timestamp offset | block length |
|1| 98 | 16383 | 4 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F| block PT | timestamp offset | block length |
|1| 97 | 16383 | 8 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F| Block PT |
|0| 97 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| event=ring |0|0| volume=0 | duration=28383 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| modulation=0 |0| volume=63 | duration=16383 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0| frequency=0 |0 0 0 0| frequency=0 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| modulation=0 |0| volume=5 | duration=12000 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0| frequency=440 |0 0 0 0| frequency=480 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 4: Combining tones and events in a single RTP packet
Optional parameters: The "events" parameter lists the events
supported by the implementation. Events are listed as one or
more comma-separated elements. Each element can either be a
single integer or two integers separated by a hyphen. No
white space is allowed in the argument. The integers
designate the event numbers supported by the implementation.
All implementations MUST support events 0 through 15, so that
the parameter can be omitted if the implementation only
supports these events.
Schulzrinne & Petrack Standards Track [Page 24]
RFC 2833 Tones May 2000
The "rate" parameter describes the sampling rate, in Hertz.
The number is written as a floating point number or as an
integer. If omitted, the default value is 8000 Hz.
Encoding considerations: This type is only defined for transfer
via RTP [1].
Security considerations: See the "Security Considerations"
(Section 7) section in this document.
Interoperability considerations: none
Published specification: This document.
Applications which use this media: The telephone-event audio
subtype supports the transport of events occurring in
telephone systems over the Internet.
Additional information:
1. Magic number(s): N/A
2. File extension(s): N/A
3. Macintosh file type code: N/A
6.2 audio/tone
MIME media type name: audio
MIME subtype name: tone
Required parameters: none
Optional parameters: The "rate" parameter describes the sampling
rate, in Hertz. The number is written as a floating point
number or as an integer. If omitted, the default value is
8000 Hz.
Encoding considerations: This type is only defined for transfer
via RTP [1].
Security considerations: See the "Security Considerations"
(Section 7) section in this document.
Interoperability considerations: none
Published specification: This document.
Schulzrinne & Petrack Standards Track [Page 25]
RFC 2833 Tones May 2000
Applications which use this media: The tone audio subtype supports
the transport of pure composite tones, for example those
commonly used in the current telephone system to signal call
progress.
Additional information:
1. Magic number(s): N/A
2. File extension(s): N/A
3. Macintosh file type code: N/A
7 Security Considerations
RTP packets using the payload format defined in this specification
are subject to the security considerations discussed in the RTP
specification (RFC 1889 [1]), and any appropriate RTP profile (for
example RFC 1890 [19]).This implies that confidentiality of the media
streams is achieved by encryption. Because the data compression used
with this payload format is applied end-to-end, encryption may be
performed after compression so there is no conflict between the two
operations.
This payload type does not exhibit any significant non-uniformity in
the receiver side computational complexity for packet processing to
cause a potential denial-of-service threat.
In older networks employing in-band signaling and lacking appropriate
tone filters, the tones in Section 3.14 may be used to commit toll
fraud.
Additional security considerations are described in RFC 2198 [6].
8 IANA Considerations
This document defines two new RTP payload formats, named telephone-
event and tone, and associated Internet media (MIME) types,
audio/telephone-event and audio/tone.
Within the audio/telephone-event type, additional events MUST be
registered with IANA. Registrations are subject to approval by the
current chair of the IETF audio/video transport working group, or by
an expert designated by the transport area director if the AVT group
has closed.
Schulzrinne & Petrack Standards Track [Page 26]
RFC 2833 Tones May 2000
The meaning of new events MUST be documented either as an RFC or an
equivalent standards document produced by another standardization
body, such as ITU-T.
9 Acknowledgements
The suggestions of the Megaco working group are gratefully
acknowledged. Detailed advice and comments were provided by Fred
Burg, Steve Casner, Fatih Erdin, Bill Foster, Mike Fox, Gunnar
Hellstrom, Terry Lyons, Steve Magnell, Vern Paxson and Colin Perkins.
10 Authors' Addresses
Henning Schulzrinne
Dept. of Computer Science
Columbia University
1214 Amsterdam Avenue
New York, NY 10027
USA
EMail: schulzrinne@cs.columbia.edu
Scott Petrack
MetaTel
45 Rumford Avenue
Waltham, MA 02453
USA
EMail: scott.petrack@metatel.com
11 Bibliography
[1] Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson,
"RTP: A Transport Protocol for Real-Time Applications", RFC
1889, January 1996.
[2] Bradner, S., "Key words for use in RFCs to Indicate Requirement
Levels", BCP 14, RFC 2119, March 1997.
[3] International Telecommunication Union, "Procedures for starting
sessions of data transmission over the public switched telephone
network," Recommendation V.8, Telecommunication Standardization
Sector of ITU, Geneva, Switzerland, Feb. 1998.
[4] R. Kocen and T. Hatala, "Voice over frame relay implementation
agreement", Implementation Agreement FRF.11, Frame Relay Forum,
Foster City, California, Jan. 1997.
Schulzrinne & Petrack Standards Track [Page 27]
RFC 2833 Tones May 2000
[5] International Telecommunication Union, "Multifrequency push-
button signal reception," Recommendation Q.24, Telecommunication
Standardization Sector of ITU, Geneva, Switzerland, 1988.
[6] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., Handley, M.,
Bolot, J., Vega-Garcia, A. and S. Fosse-Parisis, "RTP Payload
for Redundant Audio Data", RFC 2198, September 1997.
[7] Handley M. and V. Jacobson, "SDP: Session Description Protocol",
RFC 2327, April 1998.
[8] International Telecommunication Union, "Automatic answering
equipment and general procedures for automatic calling equipment
on the general switched telephone network including procedures
for disabling of echo control devices for both manually and
automatically established calls," Recommendation V.25,
Telecommunication Standardization Sector of ITU, Geneva,
Switzerland, Oct. 1996.
[9] International Telecommunication Union, "Procedures for document
facsimile transmission in the general switched telephone
network," Recommendation T.30, Telecommunication Standardization
Sector of ITU, Geneva, Switzerland, July 1996.
[10] International Telecommunication Union, "Echo cancellers,"
Recommendation G.165, Telecommunication Standardization Sector
of ITU, Geneva, Switzerland, Mar. 1993.
[11] International Telecommunication Union, "A modem operating at
data signaling rates of up to 33 600 bit/s for use on the
general switched telephone network and on leased point-to-point
2-wire telephone-type circuits," Recommendation V.34,
Telecommunication Standardization Sector of ITU, Geneva,
Switzerland, Feb. 1998.
[12] International Telecommunication Union, "Procedures for the
identification and selection of common modes of operation
between data circuit-terminating equipments (DCEs) and between
data terminal equipments (DTEs) over the public switched
telephone network and on leased point-to-point telephone-type
circuits," Recommendation V.8bis, Telecommunication
Standardization Sector of ITU, Geneva, Switzerland, Sept. 1998.
[13] International Telecommunication Union, "Application of tones and
recorded announcements in telephone services," Recommendation
E.182, Telecommunication Standardization Sector of ITU, Geneva,
Switzerland, Mar. 1998.
Schulzrinne & Petrack Standards Track [Page 28]
RFC 2833 Tones May 2000
[14] Bellcore, "Functional criteria for digital loop carrier
systems," Technical Requirement TR-NWT-000057, Telcordia
(formerly Bellcore), Morristown, New Jersey, Jan. 1993.
[15] J. G. van Bosse, Signaling in Telecommunications Networks
Telecommunications and Signal Processing, New York, New York:
Wiley, 1998.
[16] International Telecommunication Union, "AAL type 2 service
specific convergence sublayer for trunking," Recommendation
I.366.2, Telecommunication Standardization Sector of ITU,
Geneva, Switzerland, Feb. 1999.
[17] International Telecommunication Union, "Various tones used in
national networks," Recommendation Supplement 2 to
Recommendation E.180, Telecommunication Standardization Sector
of ITU, Geneva, Switzerland, Jan. 1994.
[18] International Telecommunication Union, "Technical
characteristics of tones for telephone service," Recommendation
Supplement 2 to Recommendation E.180, Telecommunication
Standardization Sector of ITU, Geneva, Switzerland, Jan. 1994.
[19] Schulzrinne, H., "RTP Profile for Audio and Video Conferences
with Minimal Control", RFC 1890, January 1996.
Schulzrinne & Petrack Standards Track [Page 29]
RFC 2833 Tones May 2000
12 Full Copyright Statement
Copyright (C) The Internet Society (2000). All Rights Reserved.
This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.
The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Acknowledgement
Funding for the RFC Editor function is currently provided by the
Internet Society.
Schulzrinne & Petrack Standards Track [Page 30]
|