Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Data Retrieval through a Compact Disk Device having a Speech-Driven Interface

IP.com Disclosure Number: IPCOM000114661D
Original Publication Date: 1995-Jan-01
Included in the Prior Art Database: 2005-Mar-29
Document File: 2 page(s) / 103K

Publishing Venue

IBM

Related People

Cohen, PS: AUTHOR [+3]

Abstract

Disclosed is a method for recording data to facilitate speech recognition on a Compact Disk (CD) and for retrieving this data through a multimedia CD player having a speech-driven end-user interface. A simple embodiment of the device provides for attachment to a television set, with operating instructions provided through a telephone-type handset connected by infrared or other wireless means to act as a microphone. In a more sophisticated embodiment, a Compact Disk-Read Only Memory (CD-ROM) device is provided as part of a personal computer, with any type of microphone. In a preferred embodiment, a spread-spectrum or digital wireless portable phone acts as the microphone input to the CD player, while also providing for the programming of a Video Cassette Recorder (VCR).

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Data Retrieval through a Compact Disk Device having a Speech-Driven
Interface

      Disclosed is a method for recording data to facilitate speech
recognition on a Compact Disk (CD) and for retrieving this data
through a multimedia CD player having a speech-driven end-user
interface.  A simple embodiment of the device provides for attachment
to a television set, with operating instructions provided through a
telephone-type handset connected by infrared or other wireless means
to act as a microphone.  In a more sophisticated embodiment, a
Compact Disk-Read Only Memory (CD-ROM) device is provided as part  of
a personal computer, with any type of microphone.  In a preferred
embodiment, a spread-spectrum or digital wireless portable phone acts
as the microphone input to the CD player, while also providing for
the programming of a Video Cassette Recorder (VCR).

      Phonemes and prompts are recorded on the CD, being associated
with each screen or image encoded on the disk, to provide a context
for that image, an active vocabulary within that context, phoneme
labels for each legal phrase or n-gram combination, and Backus-Naur
Form (BNF) grammar, with on-screen prompts reflecting some or all of
the words and phrases active within the context.

      The data of a CD may be viewed as a collection of numbered or
indexed objects.  While these objects may be viewed in a sequence not
extending from beginning to end, only certain paths between objects
are typically legal or probable.  These objects are generally indexed
by topic, formed into a pyramid structure of paths, or formed into a
pseudo-linear structure.  In a pyramid structure, the user is asked a
series of questions, or multiple fields of data are provided as
input, to determine a specific object or subset of objects for
display.  Pyramid structures include medical and automotive
diagnostics and some educational programs.  In a pseudo-linear
structure, speech recognition may provide a "page turner" function,
advancing the system from one object to another.

      An object may have a mixed content, including, for example,
sound, still images, text, video, and three-dimensional holography.
Within each object, there may be a specialized command vocabulary,
including words and phrases such as "replay," "backup," "skip,"
"enlarge image," "freeze-frame," "increase volume," and "change
language."  Multiple phoneme models and language models may be stored
on a single CD to enable multilingual usage, and multiple phoneme
sets may be stored to accommodate users having different accents,
dialects, ages, and sex.

      When the contents of a CD...