Browse Prior Art Database

Method to get text information from images in a web page Disclosure Number: IPCOM000015713D
Original Publication Date: 2002-Jul-01
Included in the Prior Art Database: 2003-Jun-21

Publishing Venue



Disclosed here is a new method to get text information from images in a web page. Machine translation software and machine reading software can handle text information in a web page after obtaining it the information an directly from HTML file or through system API. But they cannot handle text information included as a part of image in a web page, and can just use ALT attribute of each tags, or an hyper link references. This method text information from image data in a web page. It can be used as a pre-process of machine translation software and machine enables a user to obtain reading software. To get the text information, this method uses Optical Character Reader (OCR) and OCR Dictionary. This method updates OCR dictionaries dynamically with the information in the web page. There are two types of OCR dictionary. One is "OCR Page Dictionary" and another is "OCR Image Dictionary". OCR Page Dictionary consists of page information and is made for each page. OCR Image Dictionary consists of image information and related tag information, and. is made for each image.