Browse Prior Art Database

Method for automation of VoiceXML Browser testing Disclosure Number: IPCOM000021304D
Original Publication Date: 2004-Jan-13
Included in the Prior Art Database: 2004-Jan-13
Document File: 4 page(s) / 14K

Publishing Venue



This article describes an approach to automated VoiceXML browser testing, built on the W3C Voice Browser Working Group-defined "txml" language.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 48% of the total text.

Page 1 of 4

Method for automation of VoiceXML Browser testing


The World Wide Web Consortium's (W3C) VoiceXML 2.0 Implementation Report consisted of a suite of more than 600 test cases. As delivered, it's fairly simple to run each of these tests manually; simply install the test suite on a web server, run the supplied XSL transformations to generate VoiceXML and CGI scripts, and run a VoiceXML browser against the resulting VoiceXML applications, though running such a large number of tests manually is a fairly time-consuming, error-prone and labor-intensive process.

Running the tests automatically, is also possible, but not quite as simple. This document describes the mechanism we've used to automate the running of these tests.

The Tests Themselves

Each of the supplied tests is written using a combination of VoiceXML 2.0 and a test-specific markup defined by the W3C Voice Browser Working Group, called "txml". Txml is intended to replace all input and output VoiceXML tags (<prompt>, <grammar>, etc) tags within an application, allowing one to write language-neutral tests. Consider the following txml-based sample:

<?xml version="1.0"?> <vxml version="2.0" xmlns=""
xmlns:conf=""> <form id="sample">

<field name="test">

<conf:speech value="alpha"/>

<conf:grammar utterance="alpha"/>


<conf:comment>You said <value
expr="test"/></conf:comment> <conf:pass/>



</form> </vxml>

Notice that it contains no <prompt> or <grammar> tags in it. Using the XSL stylesheet supplied with the implementation report, the <field> above would be translated into

<field name="test">


Page 2 of 4

<prompt count="1"> Say 'Chicago'. </prompt>

<prompt count="2"> Say 'Chicago' again. </prompt>

<prompt count="3"> Say 'Chicago' one more time.
</prompt> <grammar type="application/srgs+xml"

root="CityNameid98359" version="1.0">

<rule id="CityNameid98359" scope="public">


<item> chicago </item>





<log>You said <value expr="test"/></log>



</filled> </field>

When run manually, the dialog between a person and the VoiceXML browser running this application would probably look similar to this:

Action of the Caller (Human) Browser Action

Dials VoiceXML Browser's phone number Answers, speaks "Say Chicago"

Responds to prompt with "Chicago" Recognizes "Chicago", speaks "Pass"

Logs "pass", hangs up phone Exits

The simplest approach to automating this test would be to just replace the person in the above scenario with a computer that is capable of speech recognition and text to speech, and undertands how to respond to prompts such as "Say Chicago" and "Pass".

Unfortunately, speech recognition is not perfect, and it's very possible that the computer driving the dialog (simulating the person) would misrecognize the VoiceXML browser's "Say Chicago" prompt, resulting in no response, or worse, the wrong audio being spoken. The a...