Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Mechanism to add speech recognition to image data for image sequence control

IP.com Disclosure Number: IPCOM000013121D
Original Publication Date: 2003-Jun-13
Included in the Prior Art Database: 2003-Jun-13
Document File: 1 page(s) / 41K

Publishing Venue

IBM

Abstract

Image data file and Video formats are enhanced with finite speech grammars. The speech grammars allow the user to control the navigation of a remote camera that can either be real or virtual. For a real camera, the speech grammars capture user's commands to position or move the camera when processed by an Automated Speech Recognizer (ASR) system. For a virtual camera, the speech grammars capture the user's commands to move through a sequence of images stored in a remote database and representing different locations in a virtual world.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 1

Mechanism to add speech recognition to image data for image sequence control

    A sequence of camera images can be used to remotely explore rooms of a building, a garden walk, or an imaginary environment. These images originate from either a video or still digital camera, or from an image database on a remote computer. The sequence of images is displayed to a user via a kiosk, handheld device (PDA), or computer screen. The user is given the capability to issue commands using speech to the device. The device, whether kiosk, PDA, or computer screen, supports speech recognition, although the speech recognition engine may be running on a remote server.

The user issues commands to direct the camera motion in a particular direction, where the camera may either stay in place or move. Basic motions for a static camera include pan right. left, up, and down, and zoom in and out. Motions for a moving camera include entering a room, selecting one path to walk from several, jumping over a barrier, etc. For both a static and moving camera, the directions can be called image sequence exploration, as the user is directing the next sequence of images to be displayed by the device. The types of applications which can make use of image sequence exploration include real estate for showing rooms of a house, museums for showing walk-throughs of art or science exhibits, and games for moving through an adventure. These applications will make use of both the static and moving camera directions. Moreover, each application will have its unique set of directions. A museum walk-through may accept a command to go to a specific exhibit, for example.

The user directs the motion of the camera with commands which vary according to the image currently displayed. A command to enter the room on the right is valid only if the current image displays a room on the right. Camera control using speech requires a set of speech recognition grammars where only some grammars in the set are active for a currently displayed ima...