File: section.nroff

package info (click to toggle)
xml2rfc 2.4.8-1
links: PTS
area: non-free
in suites: jessie, jessie-kfreebsd
size: 5,664 kB
ctags: 461
sloc: xml: 20,989; python: 4,566; makefile: 11; perl: 6
file content (296 lines) | stat: -rw-r--r-- 12,905 bytes
parent folder | download | duplicates (6)
.in 4
.ti 0

1.  Document Conventions
.in 3

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].

Since many of the definitions and syntax are identical to those for
the Hypertext Transfer Protocol \%-- HTTP/1.1 [RFC2616], this
specification refers to the section where they are defined rather
than copying it.  For brevity, [HX.Y] is to be taken to refer to
Section\0X.Y of RFC 2616.

All the mechanisms specified in this document are described in both
prose and an augmented \%Backus-Naur form (ABNF [RFC5234]).

The complete message format in ABNF form is provided in [S.abnf] and
is the normative format definition.  Note that productions may be
duplicated within the main body of the document for reading
convenience.  If a production in the body of the text conflicts with
one in the normative definition, the latter rules.
.in 6
.ti 0

1.1.  Definitions
.in 18
.ti 3

Media Resource
.sp 0
An entity on the speech processing server that can be
controlled through MRCPv2.
.ti 3

MRCP Server
.sp 0
Aggregate of one or more "Media Resource" entities on
a server, exposed through MRCPv2.  Often, 'server' in
this document refers to an MRCP server.
.ti 3

MRCP Client
.sp 0
An entity controlling one or more Media Resources
through MRCPv2 ("Client" for short).
.ti 3

DTMF
.sp 0
\%Dual-Tone \%Multi-Frequency; a method of transmitting
key presses \%in-band, either as actual tones (Q.23
[Q.23]) or as named tone events (RFC 4733 [RFC4733]).
.ti 3

Endpointing
.sp 0
The process of automatically detecting the beginning
and end of speech in an audio stream.  This is
critical both for speech recognition and for automated
recording as one would find in voice mail systems.
.ti 3

Hotword Mode
.sp 0
A mode of speech recognition where a stream of
utterances is evaluated for match against a small set
of command words.  This is generally employed either
to trigger some action or to control the subsequent
grammar to be used for further recognition.
.in 6
.ti 0

1.2.  Nroff \%Beginning-of-Line Character Escaping
.in 3

\&'Line starting with single apostrophe'.

\&... Line starting with period.
.in 6
.ti 0

1.3.  A Section\0Title which is so long that it will extend to the next
Line.
.ti 0

1.4.  \%State-Machine Diagrams
.in 3

The \%state-machine diagrams in this document do not show every
possible method call.  Rather, they reflect the state of the resource
based on the methods that have moved to \%IN-PROGRESS or COMPLETE
states (see [sec.response]).  Note that since PENDING requests
essentially have not affected the resource yet and are in the queue
to be processed, they are not reflected in the \%state-machine
diagrams.
.in 6
.ti 0

1.5.  URI Schemes
.in 3

This document defines many protocol headers that contain URIs
(Uniform Resource Identifiers [RFC3986]) or lists of URIs for
referencing media.  The entire document, including the Security
Considerations section (Section\012), assumes that HTTP or HTTP over
TLS (HTTPS) [RFC2818] will be used as the URI addressing scheme
unless otherwise stated.  However, implementations MAY support other
schemes (such as 'file'), provided they have addressed any security
considerations described in this document and any others particular
to the specific scheme.  For example, implementations where the
client and server both reside on the same physical hardware and the
file system is secured by traditional file access controls on \%user-
level could be reasonable candidates for supporting the 'file'
scheme.
.in 4
.ti 0

2.  Managing Resource Control Channels
.in 3

The client needs a separate MRCPv2 resource control channel to
control each media processing resource under the SIP dialog.  A
unique channel identifier string identifies these resource control
channels.  The channel identifier is a \%difficult-to-guess,
unambiguous string followed by an "@", then by a string token
specifying the type of resource.  The server generates the channel
identifier and MUST make sure it does not clash with the identifier
of any other MRCP channel currently allocated by that server.  MRCPv2
defines the following \%IANA-registered types of media processing
resources.  Additional resource types and their associated methods/
events and state machines may be added as described below in
[sec.iana].
.in 0
.nf

   +---------------+----------------------+---------------------------+
   | Resource Type | Resource Description | Described in              |
   +---------------+----------------------+---------------------------+
   | speechrecog   | Speech Recognizer    | [sec.recognizerResource]  |
   | dtmfrecog     | DTMF Recognizer      | [sec.recognizerResource]  |
   | speechsynth   | Speech Synthesizer   | [sec.synthesizerResource] |
   | basicsynth    | Basic Synthesizer    | [sec.synthesizerResource] |
   | speakverify   | Speaker Verification | [sec.verifierResource]    |
   | recorder      | Speech Recorder      | [sec.recorderResource]    |
   +---------------+----------------------+---------------------------+
.fi
.in 3
.ce 1

Table\01: Resource Types

The SIP INVITE or \%re-INVITE transaction and the SDP offer/answer
exchange it carries contain "m=" lines describing the resource
control channel to be allocated.  There MUST be one SDP "m=" line for
each MRCPv2 resource to be used in the session.  This "m=" line MUST
have a media type field of "application" and a transport type field
of either "TCP/MRCPv2" or "TCP/TLS/MRCPv2".  The port number field of
the "m=" line MUST contain the "discard" port of the transport
protocol (port 9 for TCP) in the SDP offer from the client and MUST
contain the TCP listen port on the server in the SDP answer.  The
client may then either set up a TCP or TLS connection to that server
port or share an already established connection to that port.  Since
MRCPv2 allows multiple sessions to share the same TCP connection,
multiple "m=" lines in a single SDP document MAY share the same port
field value; MRCPv2 servers MUST NOT assume any relationship between
resources using the same port other than the sharing of the
communication channel.

MRCPv2 resources do not use the port or format field of the "m=" line
to distinguish themselves from other resources using the same
channel.  The client MUST specify the resource type identifier in the
resource attribute associated with the control "m=" line of the SDP
offer.  The server MUST respond with the full \%Channel-Identifier
(which includes the resource type identifier and a \%difficult-to-
guess, unambiguous string) in the "channel" attribute associated with
the control "m=" line of the SDP answer.  To remain backwards
compatible with conventional SDP usage, the format field of the "m="
line MUST have the arbitrarily selected value of "1".

When the client wants to add a media processing resource to the
session, it issues a new SDP offer, according to the procedures of
RFC 3264 [RFC3264], in a SIP \%re-INVITE request.  The SDP offer/answer
exchange carried by this SIP transaction contains one or more
additional control "m=" lines for the new resources to be allocated
to the session.  The server, on seeing the new "m=" line, allocates
the resources (if they are available) and responds with a
corresponding control "m=" line in the SDP answer carried in the SIP
response.  If the new resources are not available, the \%re-INVITE
receives an error message, and existing media processing going on
before the \%re-INVITE will continue as it was before.  It is not
possible to allocate more than one resource of each type.  If a
client requests more than one resource of any type, the server MUST
behave as if the resources of that type (beyond the first one) are
not available.

MRCPv2 clients and servers using TCP as a transport protocol MUST use
the procedures specified in RFC 4145 [RFC4145] for setting up the TCP
connection, with the considerations described hereby.  Similarly,
MRCPv2 clients and servers using TCP/TLS as a transport protocol MUST
use the procedures specified in RFC 4572 [RFC4572] for setting up the
TLS connection, with the considerations described hereby.  The
a=setup attribute, as described in RFC 4145 [RFC4145], MUST be
"active" for the offer from the client and MUST be "passive" for the
answer from the MRCPv2 server.  The a=connection attribute MUST have
a value of "new" on the very first control "m=" line offer from the
client to an MRCPv2 server.  Subsequent control "m=" line offers from
the client to the MRCP server MAY contain "new" or "existing",
depending on whether the client wants to set up a new connection or
share an existing connection, respectively.  If the client specifies
a value of "new", the server MUST respond with a value of "new".  If
the client specifies a value of "existing", the server MUST respond.
The legal values in the response are "existing" if the server prefers
to share an existing connection or "new" if not.  In the latter case,
the client MUST initiate a new transport connection.

When the client wants to deallocate the resource from this session,
it issues a new SDP offer, according to RFC 3264 [RFC3264], where the
control "m=" line port MUST be set to 0.  This SDP offer is sent in a
SIP \%re-INVITE request.  This deallocates the associated MRCPv2
identifier and resource.  The server MUST NOT close the TCP or TLS
connection if it is currently being shared among multiple MRCP
channels.  When all MRCP channels that may be sharing the connection
are released and/or the associated SIP dialog is terminated, the
client or server terminates the connection.

When the client wants to tear down the whole session and all its
resources, it MUST issue a SIP BYE request to close the SIP session.
This will deallocate all the control channels and resources allocated
under the session.

All servers MUST support TLS.  Servers MAY use TCP without TLS in
controlled environments (e.g., not in the public Internet) where both
nodes are inside a protected perimeter, for example, preventing
access to the MRCP server from remote nodes outside the controlled
perimeter.  It is up to the client, through the SDP offer, to choose
which transport it wants to use for an MRCPv2 session.  Aside from
the exceptions given above, when using TCP, the "m=" lines MUST
conform to RFC4145 [RFC4145], which describes the usage of SDP for
\%connection-oriented transport.  When using TLS, the SDP "m=" line for
the control stream MUST conform to \%Connection-Oriented Media
(COMEDIA) over TLS [RFC4572], which specifies the usage of SDP for
establishing a secure \%connection-oriented transport over TLS.
.in 4
.ti 0

3.  Section\0Three
.in 6
.ti 0

3.1.  Synthesizer Message Body
.in 3

A synthesizer message can contain additional information associated
with the Request, Response, or Event in its message body.
.in 8
.ti 0

3.1.1.  Synthesizer Speech Data
.in 3

\%Marked-up text for the synthesizer to speak is specified as a typed
media entity in the message body.  The speech data to be spoken by
the synthesizer can be specified inline by embedding the data in the
message body or by reference by providing a URI for accessing the
data.  In either case, the data and the format used to markup the
speech needs to be of a content type supported by the server.

All MRCPv2 servers containing synthesizer resources MUST support both
plain text speech data and W3C's Speech Synthesis Markup Language
\%[W3C.REC-speech-synthesis-20040907] and hence MUST support the media
types 'text/plain' and 'application/ssml+xml'.  Other formats MAY be
supported.

If the speech data is to be fetched by URI reference, the media type
\%\&'text/uri-list' (see RFC 2483 [RFC2483]) is used to indicate one or
more URIs that, when dereferenced, will contain the content to be
spoken.  If a list of speech URIs is specified, the resource MUST
speak the speech data provided by each URI in the order in which the
URIs are specified in the content.

MRCPv2 clients and servers MUST support the 'multipart/mixed' media
type.  This is the appropriate media type to use when providing a mix
of URI and inline speech data.  Embedded within the multipart content
block, there MAY be content for the \%'text/uri-list', 'application/
ssml+xml', and/or 'text/plain' media types.  The character set and
encoding used in the speech data is specified according to standard
media type definitions.  The multipart content MAY also contain
actual audio data.  Clients may have recorded audio clips stored in
memory or on a local device and wish to play it as part of the SPEAK
request.  The audio portions MAY be sent by the client as part of the
multipart content block.  This audio is referenced in the speech
markup data that is another part in the multipart content block
according to the 'multipart/mixed' media type specification.