Browse Prior Art Database

System and method for searching audio segments

IP.com Disclosure Number: IPCOM000018947D
Original Publication Date: 2003-Aug-21
Included in the Prior Art Database: 2003-Aug-21
Document File: 2 page(s) / 56K

Publishing Venue

IBM

Abstract

Disclosed is a method for searching pre-recorded audio segments during application development (eg., Interactive Voice Response applications) to improve and promote reusability and minimize the number of prompts that require recording. Although the technology and method of searching audio segments is known in the market, there are no known tools to make use of this technology to reuse pre-recorded prompts in software development.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 51% of the total text.

Page 1 of 2

System and method for searching audio segments

  Automated and self-service applications are present in modern day-by-day routines. When calling to check the status of a flight (or check account balance or claim an accident or use the self-checkin system in the airport and many other situations), there is a very high probability of reaching an Interactive Voice Response (IVR) or other, similar automated system. These systems reduce the costs of services by replacing operators, decreasing call duration and many other factors that drive customers to seek state of art technologies to provide high quality service and keep (or even improve) end user satisfaction.

On the other side, development organizations writing these applications invest in tools to improve productivity, reduce timeframes, promote reusability, facilitate maintanance and so on. The applications described in this disclosure implement a dialog between the machine and the end user. These applications usually use pre-recorded audio segments for the static dialog. For the dynamic dialog, various approaches can be followed, such as the use of text-to-speech technology or splicing pre-recorded audio segments. Usually the development organization contracts a studio or a voice talent to record the sentences/words used in the dialog interface. Normally these studios charge a minimum amount to record the segments, so developers don't want to contract them to record one or two audio segments only. There are at least 3 situations in which developers need to search for pre-recorded audio segments:

During the development phase, after identifying the exact wording of the dialog interface, developers need to search in their existing database for phrases or partial phrases previously recorded by the same voice talent to minimize the number of prompts to record. When a change is implemented in the system and only a few audio segments need re-recording or modification. When new functions are added to exisiting applications, requiring new audio segments.

The idea disclosured here provides a system and method for searching pre-recorded audio segments in a database and presenting a report to the developer summarizing the findings. Using the report, the developer can determine which audio segments for the new application to record and which ones to reuse.

There are 3 main components in this system:

Database storing the pre-recorded audio segments, organized by voice talent, gender and category. Search engine capable of searching for sentences and words on pre-recorded audio segments, identifying phonemes and retrieving results regardless of spelling or speaker. Optionally, if the text corresponding to the audio segment is available, the search engine can search by text or a combination of audio and text. The search engine could either use a transcription s...