Media Resource Control Protocol Version 2 (MRCPv2) (RFC6787)

Original Publication Date: 2012-Nov-01
MRCPv2 is designed to allow a client device to control media processing resources on the network. Some of these media processing resources include speech recognition engines, speech synthesis engines, speaker verification, and speaker identification engines. MRCPv2 enables the implementation of distributed Interactive Voice Response platforms using VoiceXML [W3C.REC-voicexml20-20040316] browsers or other client applications while maintaining separate back-end speech processing capabilities on specialized speech processing servers. MRCPv2 is based on the earlier Media Resource Control Protocol (MRCP) [RFC4463] developed jointly by Cisco Systems, Inc., Nuance Communications, and Speechworks, Inc. Although some of the method names are similar, the way in which these methods are communicated is different. There are also more resources and more methods for each resource. The first version of MRCP was essentially taken only as input to the development of this protocol. There is no expectation that an MRCPv2 client will work with an MRCPv1 server or vice versa. There is no migration plan or gateway definition between the two protocols.

            Media Resource Control Protocol Version 2 (MRCPv2)


   The Media Resource Control Protocol Version 2 (MRCPv2) allows client    hosts to control media service resources such as speech synthesizers,    recognizers, verifiers, and identifiers residing in servers on the    network.  MRCPv2 is not a "stand-alone" protocol -- it relies on    other protocols, such as the Session Initiation Protocol (SIP), to    coordinate MRCPv2 clients and servers and manage sessions between    them, and the Session Description Protocol (SDP) to describe,    discover, and exchange capabilities.  It also depends on SIP and SDP    to establish the media sessions and associated parameters between the    media source or sink and the media server.  Once this is done, the    MRCPv2 exchange operates over the control session established above,    allowing the client to control the media processing resources on the    speech resource server.

